30 Sep, 2016

1 commit


16 Sep, 2016

1 commit

  • Several filename crypto functions: fname_decrypt(),
    fscrypt_fname_disk_to_usr(), and fscrypt_fname_usr_to_disk(), returned
    the output length on success or -errno on failure. However, the output
    length was redundant with the value written to 'oname->len'. It is also
    potentially error-prone to make callers have to check for '< 0' instead
    of '!= 0'.

    Therefore, make these functions return 0 instead of a length, and make
    the callers who cared about the return value being a length use
    'oname->len' instead. For consistency also make other callers check for
    a nonzero result rather than a negative result.

    This change also fixes the inconsistency of fname_encrypt() actually
    already returning 0 on success, not a length like the other filename
    crypto functions and as documented in its function comment.

    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Acked-by: Jaegeuk Kim

    Eric Biggers
     

11 Jul, 2016

1 commit


25 May, 2016

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Fix a number of bugs, most notably a potential stale data exposure
    after a crash and a potential BUG_ON crash if a file has the data
    journalling flag enabled while it has dirty delayed allocation blocks
    that haven't been written yet. Also fix a potential crash in the new
    project quota code and a maliciously corrupted file system.

    In addition, fix some DAX-specific bugs, including when there is a
    transient ENOSPC situation and races between writes via direct I/O and
    an mmap'ed segment that could lead to lost I/O.

    Finally the usual set of miscellaneous cleanups"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
    ext4: pre-zero allocated blocks for DAX IO
    ext4: refactor direct IO code
    ext4: fix race in transient ENOSPC detection
    ext4: handle transient ENOSPC properly for DAX
    dax: call get_blocks() with create == 1 for write faults to unwritten extents
    ext4: remove unmeetable inconsisteny check from ext4_find_extent()
    jbd2: remove excess descriptions for handle_s
    ext4: remove unnecessary bio get/put
    ext4: silence UBSAN in ext4_mb_init()
    ext4: address UBSAN warning in mb_find_order_for_block()
    ext4: fix oops on corrupted filesystem
    ext4: fix check of dqget() return value in ext4_ioctl_setproject()
    ext4: clean up error handling when orphan list is corrupted
    ext4: fix hang when processing corrupted orphaned inode list
    ext4: remove trailing \n from ext4_warning/ext4_error calls
    ext4: fix races between changing inode journal mode and ext4_writepages
    ext4: handle unwritten or delalloc buffers before enabling data journaling
    ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
    ext4: do not ask jbd2 to write data for delalloc buffers
    jbd2: add support for avoiding data writes during transaction commits
    ...

    Linus Torvalds
     

13 May, 2016

1 commit


24 Apr, 2016

1 commit

  • If a directory has a large number of empty blocks, iterating over all
    of them can take a long time, leading to scheduler warnings and users
    getting irritated when they can't kill a process in the middle of one
    of these long-running readdir operations. Fix this by adding checks to
    ext4_readdir() and ext4_htree_fill_tree().

    This was reverted earlier due to a typo in the original commit where I
    experimented with using signal_pending() instead of
    fatal_signal_pending(). The test was in the wrong place if we were
    going to return signal_pending() since we would end up returning
    duplicant entries. See 9f2394c9be47 for a more detailed explanation.

    Added fix as suggested by Linus to check for signal_pending() in
    in the filldir() functions.

    Reported-by: Benjamin LaHaise
    Google-Bug-Id: 27880676
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

11 Apr, 2016

1 commit

  • This reverts commit 1028b55bafb7611dda1d8fed2aeca16a436b7dff.

    It's broken: it makes ext4 return an error at an invalid point, causing
    the readdir wrappers to write the the position of the last successful
    directory entry into the position field, which means that the next
    readdir will now return that last successful entry _again_.

    You can only return fatal errors (that terminate the readdir directory
    walk) from within the filesystem readdir functions, the "normal" errors
    (that happen when the readdir buffer fills up, for example) happen in
    the iterorator where we know the position of the actual failing entry.

    I do have a very different patch that does the "signal_pending()"
    handling inside the iterator function where it is allowable, but while
    that one passes all the sanity checks, I screwed up something like four
    times while emailing it out, so I'm not going to commit it today.

    So my track record is not good enough, and the stars will have to align
    better before that one gets committed. And it would be good to get some
    review too, of course, since celestial alignments are always an iffy
    debugging model.

    IOW, let's just revert the commit that caused the problem for now.

    Reported-by: Greg Thelen
    Cc: Theodore Ts'o
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Apr, 2016

1 commit

  • Pull ext4 bugfixes from Ted Ts'o:
    "These changes contains a fix for overlayfs interacting with some
    (badly behaved) dentry code in various file systems. These have been
    reviewed by Al and the respective file system mtinainers and are going
    through the ext4 tree for convenience.

    This also has a few ext4 encryption bug fixes that were discovered in
    Android testing (yes, we will need to get these sync'ed up with the
    fs/crypto code; I'll take care of that). It also has some bug fixes
    and a change to ignore the legacy quota options to allow for xfstests
    regression testing of ext4's internal quota feature and to be more
    consistent with how xfs handles this case"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: ignore quota mount options if the quota feature is enabled
    ext4 crypto: fix some error handling
    ext4: avoid calling dquot_get_next_id() if quota is not enabled
    ext4: retry block allocation for failed DIO and DAX writes
    ext4: add lockdep annotations for i_data_sem
    ext4: allow readdir()'s of large empty directories to be interrupted
    btrfs: fix crash/invalid memory access on fsync when using overlayfs
    ext4 crypto: use dget_parent() in ext4_d_revalidate()
    ext4: use file_dentry()
    ext4: use dget_parent() in ext4_file_open()
    nfs: use file_dentry()
    fs: add file_dentry()
    ext4 crypto: don't let data integrity writebacks fail with ENOMEM
    ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()

    Linus Torvalds
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

31 Mar, 2016

1 commit

  • If a directory has a large number of empty blocks, iterating over all
    of them can take a long time, leading to scheduler warnings and users
    getting irritated when they can't kill a process in the middle of one
    of these long-running readdir operations. Fix this by adding checks to
    ext4_readdir() and ext4_htree_fill_tree().

    Reported-by: Benjamin LaHaise
    Google-Bug-Id: 27880676
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

23 Mar, 2016

1 commit


16 Feb, 2016

1 commit


08 Feb, 2016

1 commit


18 Oct, 2015

2 commits


01 Jun, 2015

2 commits


19 May, 2015

2 commits

  • This is a pretty massive patch which does a number of different things:

    1) The per-inode encryption information is now stored in an allocated
    data structure, ext4_crypt_info, instead of directly in the node.
    This reduces the size usage of an in-memory inode when it is not
    using encryption.

    2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode
    encryption structure instead. This remove an unnecessary memory
    allocation and free for the fname_crypto_ctx as well as allowing us
    to reuse the ctfm in a directory for multiple lookups and file
    creations.

    3) We also cache the inode's policy information in the ext4_crypt_info
    structure so we don't have to continually read it out of the
    extended attributes.

    4) We now keep the keyring key in the inode's encryption structure
    instead of releasing it after we are done using it to derive the
    per-inode key. This allows us to test to see if the key has been
    revoked; if it has, we prevent the use of the derived key and free
    it.

    5) When an inode is released (or when the derived key is freed), we
    will use memset_explicit() to zero out the derived key, so it's not
    left hanging around in memory. This implies that when a user logs
    out, it is important to first revoke the key, and then unlink it,
    and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to
    release any decrypted pages and dcache entries from the system
    caches.

    6) All this, and we also shrink the number of lines of code by around
    100. :-)

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     
  • Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

02 May, 2015

1 commit


12 Apr, 2015

2 commits


03 Apr, 2015

1 commit


30 Aug, 2014

1 commit


29 Jul, 2014

1 commit

  • Before converting an inline directory to a regular directory, check
    the directory entries to make sure they're not obviously broken.
    This helps us to avoid a BUG_ON if one of the dirents is trashed.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger

    Darrick J. Wong
     

28 May, 2014

1 commit


24 Jan, 2014

1 commit


29 Aug, 2013

1 commit


29 Jun, 2013

1 commit


20 Apr, 2013

1 commit

  • Zach reported a problem that if inline data is enabled, we don't
    tell the difference between the offset of '.' and '..'. And a
    getdents will fail if the user only want to get '.' and what's worse,
    if there is a conversion happens when the user calls getdents
    many times, he/she may get the same entry twice.

    In theory, a dir block would also fail if it is converted to a
    hashed-index based dir since f_pos will become a hash value, not the
    real one, but it doesn't happen. And a deep investigation shows that
    we uses a hash based solution even for a normal dir if the dir_index
    feature is enabled.

    So this patch just adds a new htree_inlinedir_to_tree for inline dir,
    and if we find that the hash index is supported, we will do like what
    we do for a dir block.

    Reported-by: Zach Brown
    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     

03 Mar, 2013

2 commits

  • Pull ext4 bug fixes from Ted Ts'o:
    "Various bug fixes for ext4. The most important is a fix for the new
    extent cache's slab shrinker which can cause significant, user-visible
    pauses when the system is under memory pressure."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: enable quotas before orphan cleanup
    ext4: don't allow quota mount options when quota feature enabled
    ext4: fix a warning from sparse check for ext4_dir_llseek
    ext4: convert number of blocks to clusters properly
    ext4: fix possible memory leak in ext4_remount()
    jbd2: fix ERR_PTR dereference in jbd2__journal_start
    ext4: use percpu counter for extent cache count
    ext4: optimize ext4_es_shrink()

    Linus Torvalds
     
  • ext4_dir_llseek is only used as a callback function, and no one calls
    it directly. So make it as a static function in order to remove a
    warning message from sparse check.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"

    Zheng Liu
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


29 Jan, 2013

1 commit

  • Commit b0336e8d (ext4: calculate and verify checksums of directory
    leaf blocks) and commit dbe89444 (ext4: Calculate and verify checksums
    for htree nodes) forget to release buffer when checksum failed, at
    some places.

    Signed-off-by: Guo Chao
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Darrick J. Wong

    Guo Chao
     

18 Dec, 2012

1 commit


11 Dec, 2012

2 commits


23 Jul, 2012

1 commit


30 Apr, 2012

1 commit

  • Calculate and verify the checksums for directory leaf blocks
    (i.e. blocks that only contain actual directory entries). The
    checksum lives in what looks to be an unused directory entry with a 0
    name_len at the end of the block. This scheme is not used for
    internal htree nodes because the mechanism in place there only costs
    one dx_entry, whereas the "empty" directory entry would cost two
    dx_entries.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: "Theodore Ts'o"

    Darrick J. Wong