15 Oct, 2016

2 commits


30 Aug, 2016

7 commits

  • Move loop to make enough space in the inode from
    ext4_expand_extra_isize_ea() into a separate function to make that
    function smaller and better readable and also to avoid delaration of
    variables inside a loop block.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • 'start' variable is completely unused in ext4_expand_extra_isize_ea().
    Variable 'first' is used only once in one place. So just remove them.
    Variables 'entry' and 'last' are only really used later in the function
    inside a loop. Move their declarations there.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Factor out function for moving xattrs from inode into external xattr
    block from ext4_expand_extra_isize_ea(). That function is already quite
    long and factoring out this rather standalone functionality helps
    readability.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • We were checking whether computed offsets do not exceed end of block in
    ext4_xattr_shift_entries(). However this does not make sense since we
    always only decrease offsets. So replace that assertion with a check
    whether we really decrease xattrs value offsets.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently we don't support xattrs with e_value_block set. We don't allow
    them to pass initial xattr check so there's no point for checking for
    this later. Since these tests were untested, bugs were creeping in and
    not all places which should have checked were checking e_value_block
    anyway.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently we don't support xattrs with values stored out of line. Check
    for that in ext4_xattr_check_names() to make sure we never work with
    such xattrs since not all the code counts with that resulting is possible
    weird corruption issues.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Conditions checking whether there is enough free space in an xattr block
    and when xattr is large enough to make enough space in the inode forgot
    to account for the fact that inode need not be completely filled up with
    xattrs. Thus we could move unnecessarily many xattrs out of inode or
    even falsely claim there is not enough space to expand the inode. We
    also forgot to update the amount of free space in xattr block when moving
    more xattrs and thus could decide to move too big xattr resulting in
    unexpected failure.

    Fix these problems by properly updating free space in the inode and
    xattr block as we move xattrs. To simplify the math, avoid shifting
    xattrs after removing each one xattr and instead just shift xattrs only
    once there is enough free space in the inode.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

12 Aug, 2016

2 commits

  • When we need to move xattrs into external xattr block, we call
    ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end
    up calling ext4_mark_inode_dirty() again which will recurse back into
    the inode expansion code leading to deadlocks.

    Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move
    its management into ext4_expand_extra_isize_ea() since its manipulation
    is safe there (due to xattr_sem) from possible races with
    ext4_xattr_set_handle() which plays with it as well.

    CC: stable@vger.kernel.org # 4.4.x
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • We did not count with the padding of xattr value when computing desired
    shift of xattrs in the inode when expanding i_extra_isize. As a result
    we could create unaligned start of inline xattrs. Account for alignment
    properly.

    CC: stable@vger.kernel.org # 4.4.x-
    Signed-off-by: Jan Kara

    Jan Kara
     

11 Aug, 2016

2 commits

  • When multiple xattrs need to be moved out of inode, we did not properly
    recompute total size of xattr headers in the inode and the new header
    position. Thus when moving the second and further xattr we asked
    ext4_xattr_shift_entries() to move too much and from the wrong place,
    resulting in possible xattr value corruption or general memory
    corruption.

    CC: stable@vger.kernel.org # 4.4.x
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • The code in ext4_expand_extra_isize_ea() treated new_extra_isize
    argument sometimes as the desired target i_extra_isize and sometimes as
    the amount by which we need to grow current i_extra_isize. These happen
    to coincide when i_extra_isize is 0 which used to be the common case and
    so nobody noticed this until recently when we added i_projid to the
    inode and so i_extra_isize now needs to grow from 28 to 32 bytes.

    The result of these bugs was that we sometimes unnecessarily decided to
    move xattrs out of inode even if there was enough space and we often
    ended up corrupting in-inode xattrs because arguments to
    ext4_xattr_shift_entries() were just wrong. This could demonstrate
    itself as BUG_ON in ext4_xattr_shift_entries() triggering.

    Fix the problem by introducing new isize_diff variable and use it where
    appropriate.

    CC: stable@vger.kernel.org # 4.4.x
    Reported-by: Dave Chinner
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

04 Jul, 2016

1 commit

  • We temporally change checksum fields in buffers of some types of
    metadata into '0' for verifying the checksum values. By doing this
    without locking the buffer, some metadata's checksums, which are
    being committed or written back to the storage, could be damaged.
    In our test, several metadata blocks were found with damaged metadata
    checksum value during recovery process. When we only verify the
    checksum value, we have to avoid modifying checksum fields directly.

    Signed-off-by: Daeho Jeong
    Signed-off-by: Youngjin Gil
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Darrick J. Wong

    Daeho Jeong
     

23 Mar, 2016

1 commit


23 Feb, 2016

4 commits

  • To reduce amount of damage caused by single bad block, we limit number
    of inodes sharing an xattr block to 1024. Thus there can be more xattr
    blocks with the same contents when there are lots of files with the same
    extended attributes. These xattr blocks naturally result in hash
    collisions and can form long hash chains and we unnecessarily check each
    such block only to find out we cannot use it because it is already
    shared by too many inodes.

    Add a reusable flag to cache entries which is cleared when a cache entry
    has reached its maximum refcount. Cache entries which are not marked
    reusable are skipped by mb_cache_entry_find_{first,next}. This
    significantly speeds up mbcache when there are many same xattr blocks.
    For example for xattr-bench with 5 values and each process handling
    20000 files, the run for 64 processes is 25x faster with this patch.
    Even for 8 processes the speedup is almost 3x. We have also verified
    that for situations where there is only one xattr block of each kind,
    the patch doesn't have a measurable cost.

    [JK: Remove handling of setting the same value since it is not needed
    anymore, check for races in e_reusable setting, improve changelog,
    add measurements]

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Andreas Gruenbacher
     
  • When someone tried to set xattr to the same value (i.e., not changing
    anything) we did all the work of removing original xattr, possibly
    breaking references to shared xattr block, inserting new xattr, and
    merging xattr blocks again. Since this is not so rare operation and it
    is relatively cheap for us to detect this case, check for this and
    shortcut xattr setting in that case.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Since old mbcache code is gone, let's rename new code to mbcache since
    number 2 is now meaningless. This is just a mechanical replacement.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • The conversion is generally straightforward. The only tricky part is
    that xattr block corresponding to found mbcache entry can get freed
    before we get buffer lock for that block. So we have to check whether
    the entry is still valid after getting buffer lock.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

13 Jan, 2016

1 commit

  • Pull misc vfs updates from Al Viro:
    "All kinds of stuff. That probably should've been 5 or 6 separate
    branches, but by the time I'd realized how large and mixed that bag
    had become it had been too close to -final to play with rebasing.

    Some fs/namei.c cleanups there, memdup_user_nul() introduction and
    switching open-coded instances, burying long-dead code, whack-a-mole
    of various kinds, several new helpers for ->llseek(), assorted
    cleanups and fixes from various people, etc.

    One piece probably deserves special mention - Neil's
    lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
    called without ->i_mutex and tries to avoid ever taking it. That, of
    course, means that it's not useful for any directory modifications,
    but things like getting inode attributes in nfds readdirplus are fine
    with that. I really should've asked for moratorium on lookup-related
    changes this cycle, but since I hadn't done that early enough... I
    *am* asking for that for the coming cycle, though - I'm going to try
    and get conversion of i_mutex to rwsem with ->lookup() done under lock
    taken shared.

    There will be a patch closer to the end of the window, along the lines
    of the one Linus had posted last May - mechanical conversion of
    ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
    inode_is_locked()/inode_lock_nested(). To quote Linus back then:

    -----
    | This is an automated patch using
    |
    | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    | with a very few manual fixups
    -----

    I'm going to send that once the ->i_mutex-affecting stuff in -next
    gets mostly merged (or when Linus says he's about to stop taking
    merges)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    nfsd: don't hold i_mutex over userspace upcalls
    fs:affs:Replace time_t with time64_t
    fs/9p: use fscache mutex rather than spinlock
    proc: add a reschedule point in proc_readfd_common()
    logfs: constify logfs_block_ops structures
    fcntl: allow to set O_DIRECT flag on pipe
    fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
    fs: xattr: Use kvfree()
    [s390] page_to_phys() always returns a multiple of PAGE_SIZE
    nbd: use ->compat_ioctl()
    fs: use block_device name vsprintf helper
    lib/vsprintf: add %*pg format specifier
    fs: use gendisk->disk_name where possible
    poll: plug an unused argument to do_poll
    amdkfd: don't open-code memdup_user()
    cdrom: don't open-code memdup_user()
    rsxx: don't open-code memdup_user()
    mtip32xx: don't open-code memdup_user()
    [um] mconsole: don't open-code memdup_user_nul()
    [um] hostaudio: don't open-code memdup_user()
    ...

    Linus Torvalds
     

07 Jan, 2016

1 commit


14 Dec, 2015

1 commit

  • Change the list operation to only return whether or not an attribute
    should be listed. Copying the attribute names into the buffer is moved
    to the callers.

    Since the result only depends on the dentry and not on the attribute
    name, we do not pass the attribute name to list operations.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

14 Nov, 2015

1 commit

  • The xattr_handler operations are currently all passed a file system
    specific flags value which the operations can use to disambiguate between
    different handlers; some file systems use that to distinguish the xattr
    namespace, for example. In some oprations, it would be useful to also have
    access to the handler prefix. To allow that, pass a pointer to the handler
    to operations instead of the flags value alone.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

18 Oct, 2015

2 commits


27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

16 Apr, 2015

1 commit


03 Apr, 2015

2 commits


13 Oct, 2014

1 commit

  • Besides the fact that this replacement improves code readability
    it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
    which may result attempt to use uninitialized csum machinery.

    #Testcase_BEGIN
    IMG=/dev/ram0
    MNT=/mnt
    mkfs.ext4 $IMG
    mount $IMG $MNT
    #Enable feature directly on disk, on mounted fs
    tune2fs -O metadata_csum $IMG
    # Provoke metadata update, likey result in OOPS
    touch $MNT/test
    umount $MNT
    #Testcase_END

    # Replacement script
    @@
    expression E;
    @@
    - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
    + ext4_has_metadata_csum(E)

    https://bugzilla.kernel.org/show_bug.cgi?id=82201

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Dmitry Monakhov
     

17 Sep, 2014

1 commit

  • When loading extended attributes, check each entry's value offset to
    make sure it doesn't collide with the entries.

    Without this check it is easy to crash the kernel by mounting a
    malicious FS containing a file with an EA wherein e_value_offs = 0 and
    e_value_size > 0 and then deleting the EA, which corrupts the name
    list.

    (See the f_ea_value_crash test's FS image in e2fsprogs for an example.)

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Darrick J. Wong
     

05 Sep, 2014

1 commit

  • The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented
    because it was too hard to make sure the mballoc and get_block flags
    could be reliably passed down through all of the codepaths that end up
    calling ext4_mb_new_blocks().

    Since then, we have mb_flags passed down through most of the code
    paths, so getting rid of EXT4_STATE_DELALLOC_RESERVED isn't as tricky
    as it used to.

    This commit plumbs in the last of what is required, and then adds a
    WARN_ON check to make sure we haven't missed anything. If this passes
    a full regression test run, we can then drop
    EXT4_STATE_DELALLOC_RESERVED.

    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Theodore Ts'o
     

13 May, 2014

2 commits


12 May, 2014

1 commit


07 Apr, 2014

1 commit

  • When heavily exercising xattr code the assertion that
    jbd2_journal_dirty_metadata() shouldn't return error was triggered:

    WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
    jbd2_journal_dirty_metadata+0x1ba/0x260()

    CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1
    Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
    ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
    ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
    0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
    Call Trace:
    [] dump_stack+0x19/0x1b
    [] warn_slowpath_common+0x70/0xa0
    [] warn_slowpath_null+0x1a/0x20
    [] jbd2_journal_dirty_metadata+0x1ba/0x260
    [] __ext4_handle_dirty_metadata+0xa3/0x140
    [] ext4_xattr_release_block+0x103/0x1f0
    [] ext4_xattr_block_set+0x1e0/0x910
    [] ext4_xattr_set_handle+0x38b/0x4a0
    [] ? trace_hardirqs_on+0xd/0x10
    [] ext4_xattr_set+0xc2/0x140
    [] ext4_xattr_user_set+0x47/0x50
    [] generic_setxattr+0x6e/0x90
    [] __vfs_setxattr_noperm+0x7b/0x1c0
    [] vfs_setxattr+0xc4/0xd0
    [] setxattr+0x13e/0x1e0
    [] ? __sb_start_write+0xe7/0x1b0
    [] ? mnt_want_write_file+0x28/0x60
    [] ? fget_light+0x3c/0x130
    [] ? mnt_want_write_file+0x28/0x60
    [] ? __mnt_want_write+0x58/0x70
    [] SyS_fsetxattr+0xbe/0x100
    [] system_call_fastpath+0x16/0x1b

    The reason for the warning is that buffer_head passed into
    jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
    caused by the following race of two ext4_xattr_release_block() calls:

    CPU1 CPU2
    ext4_xattr_release_block() ext4_xattr_release_block()
    lock_buffer(bh);
    /* False */
    if (BHDR(bh)->h_refcount == cpu_to_le32(1))
    } else {
    le32_add_cpu(&BHDR(bh)->h_refcount, -1);
    unlock_buffer(bh);
    lock_buffer(bh);
    /* True */
    if (BHDR(bh)->h_refcount == cpu_to_le32(1))
    get_bh(bh);
    ext4_free_blocks()
    ...
    jbd2_journal_forget()
    jbd2_journal_unfile_buffer()
    -> JH is gone
    error = ext4_handle_dirty_xattr_block(handle, inode, bh);
    -> triggers the warning

    We fix the problem by moving ext4_handle_dirty_xattr_block() under the
    buffer lock. Sadly this cannot be done in nojournal mode as that
    function can call sync_dirty_buffer() which would deadlock. Luckily in
    nojournal mode the race is harmless (we only dirty already freed buffer)
    and thus for nojournal mode we leave the dirtying outside of the buffer
    lock.

    Reported-by: Sage Weil
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Jan Kara
     

19 Mar, 2014

1 commit

  • This patch adds new interfaces to create and destory cache,
    ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove
    the cache creation and destory calls from ex4_init_xattr() and
    ext4_exitxattr() in fs/ext4/xattr.c.

    fs/ext4/super.c has been changed so that when a filesystem is mounted
    a cache is allocated and attched to its ext4_sb_info structure.

    fs/mbcache.c has been changed so that only one slab allocator is
    allocated and used by all mbcache structures.

    Signed-off-by: T. Makphaibulchoke

    T Makphaibulchoke
     

20 Feb, 2014

1 commit

  • The function ext4_expand_extra_isize_ea() doesn't need the size of all
    of the extended attribute headers. So if we don't calculate it when
    it is unneeded, it we can skip some undeeded memory references, and as
    a bonus, we eliminate some kvetching by static code analysis tools.

    Addresses-Coverity-Id: #741291

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

26 Jan, 2014

1 commit


01 Nov, 2013

1 commit