09 Feb, 2017

1 commit

  • commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe upstream.

    Ralf Spenneberg reported that he hit a kernel crash when mounting a
    modified ext4 image. And it turns out that kernel crashed when
    calculating fs overhead (ext4_calculate_overhead()), this is because
    the image has very large s_first_meta_bg (debug code shows it's
    842150400), and ext4 overruns the memory in count_overhead() when
    setting bitmap buffer, which is PAGE_SIZE.

    ext4_calculate_overhead():
    buf = get_zeroed_page(GFP_NOFS); 0; j--) {
    Signed-off-by: Eryu Guan
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Signed-off-by: Greg Kroah-Hartman

    Eryu Guan
     

06 Jan, 2017

9 commits

  • commit 73b92a2a5e97d17cc4d5c4fe9d724d3273fb6fd2 upstream.

    Currently data journalling is incompatible with encryption: enabling both
    at the same time has never been supported by design, and would result in
    unpredictable behavior. However, users are not precluded from turning on
    both features simultaneously. This change programmatically replaces data
    journaling for encrypted regular files with ordered data journaling mode.

    Background:
    Journaling encrypted data has not been supported because it operates on
    buffer heads of the page in the page cache. Namely, when the commit
    happens, which could be up to five seconds after caching, the commit
    thread uses the buffer heads attached to the page to copy the contents of
    the page to the journal. With encryption, it would have been required to
    keep the bounce buffer with ciphertext for up to the aforementioned five
    seconds, since the page cache can only hold plaintext and could not be
    used for journaling. Alternatively, it would be required to setup the
    journal to initiate a callback at the commit time to perform deferred
    encryption - in this case, not only would the data have to be written
    twice, but it would also have to be encrypted twice. This level of
    complexity was not justified for a mode that in practice is very rarely
    used because of the overhead from the data journalling.

    Solution:
    If data=journaled has been set as a mount option for a filesystem, or if
    journaling is enabled on a regular file, do not perform journaling if the
    file is also encrypted, instead fall back to the data=ordered mode for the
    file.

    Rationale:
    The intent is to allow seamless and proper filesystem operation when
    journaling and encryption have both been enabled, and have these two
    conflicting features gracefully resolved by the filesystem.

    Fixes: 4461471107b7
    Signed-off-by: Sergey Karamov
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Sergey Karamov
     
  • commit 578620f451f836389424833f1454eeeb2ffc9e9f upstream.

    We should set the error code if kzalloc() fails.

    Fixes: 67cf5b09a46f ("ext4: add the basic function for inline data support")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • commit 7e6e1ef48fc02f3ac5d0edecbb0c6087cd758d58 upstream.

    Don't load an inode with a negative size; this causes integer overflow
    problems in the VFS.

    [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT]

    Fixes: a48380f769df (ext4: rename i_dir_acl to i_size_high)
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • commit c48ae41bafe31e9a66d8be2ced4e42a6b57fa814 upstream.

    The commit "ext4: sanity check the block and cluster size at mount
    time" should prevent any problems, but in case the superblock is
    modified while the file system is mounted, add an extra safety check
    to make sure we won't overrun the allocated buffer.

    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 5aee0f8a3f42c94c5012f1673420aee96315925a upstream.

    Fix a large number of problems with how we handle mount options in the
    superblock. For one, if the string in the superblock is long enough
    that it is not null terminated, we could run off the end of the string
    and try to interpret superblocks fields as characters. It's unlikely
    this will cause a security problem, but it could result in an invalid
    parse. Also, parse_options is destructive to the string, so in some
    cases if there is a comma-separated string, it would be modified in
    the superblock. (Fortunately it only happens on file systems with a
    1k block size.)

    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit cd6bb35bf7f6d7d922509bf50265383a0ceabe96 upstream.

    Centralize the checks for inodes_per_block and be more strict to make
    sure the inodes_per_block_group can't end up being zero.

    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 30a9d7afe70ed6bd9191d3000e2ef1a34fb58493 upstream.

    The number of 'counters' elements needed in 'struct sg' is
    super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
    elements in the array. This is insufficient for block sizes >= 32k. In
    such cases the memcpy operation performed in ext4_mb_seq_groups_show()
    would cause stack memory corruption.

    Fixes: c9de560ded61f
    Signed-off-by: Chandan Rajendra
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Chandan Rajendra
     
  • commit 69e43e8cc971a79dd1ee5d4343d8e63f82725123 upstream.

    'border' variable is set to a value of 2 times the block size of the
    underlying filesystem. With 64k block size, the resulting value won't
    fit into a 16-bit variable. Hence this commit changes the data type of
    'border' to 'unsigned int'.

    Fixes: c9de560ded61f
    Signed-off-by: Chandan Rajendra
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Signed-off-by: Greg Kroah-Hartman

    Chandan Rajendra
     
  • commit 1566a48aaa10c6bb29b9a69dd8279f9a4fc41e35 upstream.

    If there is an error reported in mballoc via ext4_grp_locked_error(),
    the code is holding a spinlock, so ext4_commit_super() must not try to
    lock the buffer head, or else it will trigger a BUG:

    BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358
    in_atomic(): 1, irqs_disabled(): 0, pid: 993, name: mount
    CPU: 0 PID: 993 Comm: mount Not tainted 4.9.0-rc1-clouder1 #62
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
    ffff880006423548 ffffffff81318c89 ffffffff819ecdd0 0000000000000166
    ffff880006423558 ffffffff810810b0 ffff880006423580 ffffffff81081153
    ffff880006e5a1a0 ffff88000690e400 0000000000000000 ffff8800064235c0
    Call Trace:
    [] dump_stack+0x67/0x9e
    [] ___might_sleep+0xf0/0x140
    [] __might_sleep+0x53/0xb0
    [] ext4_commit_super+0x19c/0x290
    [] __ext4_grp_locked_error+0x14a/0x230
    [] ? __might_sleep+0x53/0xb0
    [] ext4_mb_generate_buddy+0x1de/0x320

    Since ext4_grp_locked_error() calls ext4_commit_super with sync == 0
    (and it is the only caller which does so), avoid locking and unlocking
    the buffer in this case.

    This can result in races with ext4_commit_super() if there are other
    problems (which is what commit 4743f83990614 was trying to address),
    but a Warning is better than BUG.

    Fixes: 4743f83990614
    Reported-by: Nikolay Borisov
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

20 Nov, 2016

1 commit

  • If the block size or cluster size is insane, reject the mount. This
    is important for security reasons (although we shouldn't be just
    depending on this check).

    Ref: http://www.securityfocus.com/archive/1/539661
    Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506
    Reported-by: Borislav Petkov
    Reported-by: Nikolay Borisov
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

15 Oct, 2016

2 commits


13 Oct, 2016

2 commits

  • The sysfs file /sys/fs/ext4/features/encryption was present on kernels
    compiled with CONFIG_EXT4_FS_ENCRYPTION=n. This was misleading because
    such kernels do not actually support ext4 encryption. Therefore, only
    provide this file on kernels compiled with CONFIG_EXT4_FS_ENCRYPTION=y.

    Note: since the ext4 feature files are all hardcoded to have a contents
    of "supported", it really is the presence or absence of the file that is
    significant, not the contents (and this change reflects that).

    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Eric Biggers
     
  • Recent commit require line continuing printks to use PR_CONT.

    Update super.c to use KERN_CONT and use vsprintf extension %pV to
    avoid a printk/vprintk/printk("\n") sequence as well.

    Signed-off-by: Joe Perches
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Joe Perches
     

12 Oct, 2016

1 commit

  • The mapping_set_error() helper sets the correct AS_ flag for the mapping
    so there is no reason to open code it. Use the helper directly.

    [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
    Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

11 Oct, 2016

3 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     

08 Oct, 2016

5 commits

  • Al Viro
     
  • Merge updates from Andrew Morton:

    - fsnotify updates

    - ocfs2 updates

    - all of MM

    * emailed patches from Andrew Morton : (127 commits)
    console: don't prefer first registered if DT specifies stdout-path
    cred: simpler, 1D supplementary groups
    CREDITS: update Pavel's information, add GPG key, remove snail mail address
    mailmap: add Johan Hovold
    .gitattributes: set git diff driver for C source code files
    uprobes: remove function declarations from arch/{mips,s390}
    spelling.txt: "modeled" is spelt correctly
    nmi_backtrace: generate one-line reports for idle cpus
    arch/tile: adopt the new nmi_backtrace framework
    nmi_backtrace: do a local dump_stack() instead of a self-NMI
    nmi_backtrace: add more trigger_*_cpu_backtrace() methods
    min/max: remove sparse warnings when they're nested
    Documentation/filesystems/proc.txt: add more description for maps/smaps
    mm, proc: fix region lost in /proc/self/smaps
    proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
    proc: add LSM hook checks to /proc//timerslack_ns
    proc: relax /proc//timerslack_ns capability requirements
    meminfo: break apart a very long seq_printf with #ifdefs
    seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
    proc: faster /proc/*/status
    ...

    Linus Torvalds
     
  • These inode operations are no longer used; remove them.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • To support DAX pmd mappings with unmodified applications, filesystems
    need to align an mmap address by the pmd size.

    Call thp_get_unmapped_area() from f_op->get_unmapped_area.

    Note, there is no change in behavior for a non-DAX file.

    Link: http://lkml.kernel.org/r/1472497881-9323-3-git-send-email-toshi.kani@hpe.com
    Signed-off-by: Toshi Kani
    Cc: Dan Williams
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Theodore Ts'o
    Cc: Andreas Dilger
    Cc: Mike Kravetz
    Cc: "Kirill A. Shutemov"
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Pull ext4 updates from Ted Ts'o:
    "Lots of bug fixes and cleanups"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: remove unused variable
    ext4: use journal inode to determine journal overhead
    ext4: create function to read journal inode
    ext4: unmap metadata when zeroing blocks
    ext4: remove plugging from ext4_file_write_iter()
    ext4: allow unlocked direct IO when pages are cached
    ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY
    fscrypto: use standard macros to compute length of fname ciphertext
    ext4: do not unnecessarily null-terminate encrypted symlink data
    ext4: release bh in make_indexed_dir
    ext4: Allow parallel DIO reads
    ext4: allow DAX writeback for hole punch
    jbd2: fix lockdep annotation in add_transaction_credits()
    blockgroup_lock.h: simplify definition of NR_BG_LOCKS
    blockgroup_lock.h: remove debris from bgl_lock_ptr() conversion
    fscrypto: make filename crypto functions return 0 on success
    fscrypto: rename completion callbacks to reflect usage
    fscrypto: remove unnecessary includes
    fscrypto: improved validation when loading inode encryption metadata
    ext4: fix memory leak when symlink decryption fails
    ...

    Linus Torvalds
     

30 Sep, 2016

10 commits

  • Signed-off-by: Eric Engestrom

    Eric Engestrom
     
  • When a file system contains an internal journal that has not been
    loaded, use the journal inode's i_size field to determine its
    contribution to the file system's overhead. (The journal's j_maxlen
    field is normally used to determine its size, but it's unavailable when
    the journal has not been loaded.)

    Signed-off-by: Eric Whitney
    Signed-off-by: Theodore Ts'o

    Eric Whitney
     
  • Factor out the code used in ext4_get_journal() to read a valid journal
    inode from storage, enabling its reuse in other functions.

    Signed-off-by: Eric Whitney
    Signed-off-by: Theodore Ts'o

    Eric Whitney
     
  • When zeroing blocks for DAX allocations, we also have to unmap aliases
    in the block device mappings. Otherwise writeback can overwrite zeros
    with stale data from block device page cache.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org

    Jan Kara
     
  • do_blockdev_direct_IO() takes care of properly plugging direct IO so
    there's no need to plug again inside ext4_file_write_iter().

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently we do not allow unlocked (meaning without inode_lock) direct
    IO when the file has any pages cached. This check is not needed anymore
    as we keep inode lock until ext4_direct_IO_write() and thus can happily
    writeback and evict any pages conflicting with current direct IO write.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • ...otherwise an user can enable encryption for certain files even
    when the filesystem is unable to support it.
    Such a case would be a filesystem created by mkfs.ext4's default
    settings, 1KiB block size. Ext4 supports encyption only when block size
    is equal to PAGE_SIZE.
    But this constraint is only checked when the encryption feature flag
    is set.

    Signed-off-by: Richard Weinberger
    Signed-off-by: Theodore Ts'o

    Richard Weinberger
     
  • Null-terminating the fscrypt_symlink_data on read is unnecessary because
    it is not string data --- it contains binary ciphertext.

    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     
  • The commit 6050d47adcad: "ext4: bail out from make_indexed_dir() on
    first error" could end up leaking bh2 in the error path.

    [ Also avoid renaming bh2 to bh, which just confuses things --tytso ]

    Cc: stable@vger.kernel.org
    Signed-off-by: yangsheng
    Signed-off-by: Theodore Ts'o

    gmail
     
  • We can easily support parallel direct IO reads. We only have to make
    sure we cannot expose uninitialized data by reading allocated block to
    which data was not written yet, or which was already truncated. That is
    easily achieved by holding inode_lock in shared mode - that excludes all
    writes, truncates, hole punches. We also have to guard against page
    writeback allocating blocks for delay-allocated pages - that race is
    handled by the fact that we writeback all the pages in the affected
    range and the lock protects us from new pages being created there.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

27 Sep, 2016

1 commit


22 Sep, 2016

3 commits

  • Currently when doing a DAX hole punch with ext4 we fail to do a writeback.
    This is because the logic around filemap_write_and_wait_range() in
    ext4_punch_hole() only looks for dirty page cache pages in the radix tree,
    not for dirty DAX exceptional entries.

    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc:
    Signed-off-by: Theodore Ts'o

    Ross Zwisler
     
  • inode_change_ok() will be resposible for clearing capabilities and IMA
    extended attributes and as such will need dentry. Give it as an argument
    to inode_change_ok() instead of an inode. Also rename inode_change_ok()
    to setattr_prepare() to better relect that it does also some
    modifications in addition to checks.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • When file permissions are modified via chmod(2) and the user is not in
    the owning group or capable of CAP_FSETID, the setgid bit is cleared in
    inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file
    permissions as well as the new ACL, but doesn't clear the setgid bit in
    a similar way; this allows to bypass the check in chmod(2). Fix that.

    References: CVE-2016-7097
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: Jan Kara
    Signed-off-by: Andreas Gruenbacher

    Jan Kara
     

16 Sep, 2016

2 commits

  • Several filename crypto functions: fname_decrypt(),
    fscrypt_fname_disk_to_usr(), and fscrypt_fname_usr_to_disk(), returned
    the output length on success or -errno on failure. However, the output
    length was redundant with the value written to 'oname->len'. It is also
    potentially error-prone to make callers have to check for '< 0' instead
    of '!= 0'.

    Therefore, make these functions return 0 instead of a length, and make
    the callers who cared about the return value being a length use
    'oname->len' instead. For consistency also make other callers check for
    a nonzero result rather than a negative result.

    This change also fixes the inconsistency of fname_encrypt() actually
    already returning 0 on success, not a length like the other filename
    crypto functions and as documented in its function comment.

    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Acked-by: Jaegeuk Kim

    Eric Biggers
     
  • This bug was introduced in v4.8-rc1.

    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Eric Biggers