11 Apr, 2016

1 commit

  • This reverts commit 1028b55bafb7611dda1d8fed2aeca16a436b7dff.

    It's broken: it makes ext4 return an error at an invalid point, causing
    the readdir wrappers to write the the position of the last successful
    directory entry into the position field, which means that the next
    readdir will now return that last successful entry _again_.

    You can only return fatal errors (that terminate the readdir directory
    walk) from within the filesystem readdir functions, the "normal" errors
    (that happen when the readdir buffer fills up, for example) happen in
    the iterorator where we know the position of the actual failing entry.

    I do have a very different patch that does the "signal_pending()"
    handling inside the iterator function where it is allowable, but while
    that one passes all the sanity checks, I screwed up something like four
    times while emailing it out, so I'm not going to commit it today.

    So my track record is not good enough, and the stars will have to align
    better before that one gets committed. And it would be good to get some
    review too, of course, since celestial alignments are always an iffy
    debugging model.

    IOW, let's just revert the commit that caused the problem for now.

    Reported-by: Greg Thelen
    Cc: Theodore Ts'o
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Apr, 2016

2 commits

  • Pull btrfs fixes from Chris Mason:
    "These are bug fixes, including a really old fsync bug, and a few trace
    points to help us track down problems in the quota code"

    * 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix file/data loss caused by fsync after rename and new inode
    btrfs: Reset IO error counters before start of device replacing
    btrfs: Add qgroup tracing
    Btrfs: don't use src fd for printk
    btrfs: fallback to vmalloc in btrfs_compare_tree
    btrfs: handle non-fatal errors in btrfs_qgroup_inherit()
    btrfs: Output more info for enospc_debug mount option
    Btrfs: fix invalid reference in replace_path
    Btrfs: Improve FL_KEEP_SIZE handling in fallocate

    Linus Torvalds
     
  • Pull orangefs fixes from Mike Marshall:
    "Orangefs cleanups and a strncpy vulnerability fix.

    Cleanups:
    - remove an unused variable from orangefs_readdir.
    - clean up printk wrapper used for ofs "gossip" debugging.
    - clean up truncate ctime and mtime setting in inode.c
    - remove a useless null check found by coccinelle.
    - optimize some memcpy/memset boilerplate code.
    - remove some useless sanity checks from xattr.c

    Fix:
    - fix a potential strncpy vulnerability"

    * tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
    orangefs: remove unused variable
    orangefs: Add KERN_ to gossip_ macros
    orangefs: strncpy -> strscpy
    orangefs: clean up truncate ctime and mtime setting
    Orangefs: fix ifnullfree.cocci warnings
    Orangefs: optimize boilerplate code.
    Orangefs: xattr.c cleanup

    Linus Torvalds
     

09 Apr, 2016

7 commits

  • Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Emit the logging messages at the appropriate levels.

    Miscellanea:

    o Change format to fmt
    o Use the more common ##__VA_ARGS__

    Signed-off-by: Joe Perches
    Signed-off-by: Mike Marshall

    Joe Perches
     
  • It would have been possible for a rogue client-core to send in a symlink
    target which is not NUL terminated. This returns EIO if the client-core
    gives us corrupt data.

    Leave debugfs and superblock code as is for now.

    Other dcache.c and namei.c strncpy instances are safe because
    ORANGEFS_NAME_MAX = NAME_MAX + 1; there is always enough space for a
    name plus a NUL byte.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • The ctime and mtime are always updated on a successful ftruncate and
    only updated on a successful truncate where the size changed.

    We handle the ``if the size changed'' bit.

    This matches FUSE's behavior.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • fs/orangefs/orangefs-debugfs.c:130:2-26: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

    NULL check before some freeing functions is not needed.

    Based on checkpatch warning
    "kfree(NULL) is safe this check is probably not required"
    and kfreeaddr.cocci by Julia Lawall.

    Generated by: scripts/coccinelle/free/ifnullfree.cocci

    Signed-off-by: Fengguang Wu
    Signed-off-by: Mike Marshall

    kbuild test robot
     
  • Suggested by David Binderman
    The former can potentially be a performance win over the latter.

    memcpy(d, s, len);
    memset(d+len, c, size-len);

    memset(d, c, size);
    memcpy(d, s, len);

    Signed-off-by: Mike Marshall

    Mike Marshall
     
  • 1. It is nonsense to test for negative size_t, suggested by
    David Binderman

    2. By the time Orangefs gets called, the vfs has ensured that
    name != NULL, and that buffer and size are sane.

    Signed-off-by: Mike Marshall

    Mike Marshall
     

08 Apr, 2016

1 commit

  • Pull ext4 bugfixes from Ted Ts'o:
    "These changes contains a fix for overlayfs interacting with some
    (badly behaved) dentry code in various file systems. These have been
    reviewed by Al and the respective file system mtinainers and are going
    through the ext4 tree for convenience.

    This also has a few ext4 encryption bug fixes that were discovered in
    Android testing (yes, we will need to get these sync'ed up with the
    fs/crypto code; I'll take care of that). It also has some bug fixes
    and a change to ignore the legacy quota options to allow for xfstests
    regression testing of ext4's internal quota feature and to be more
    consistent with how xfs handles this case"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: ignore quota mount options if the quota feature is enabled
    ext4 crypto: fix some error handling
    ext4: avoid calling dquot_get_next_id() if quota is not enabled
    ext4: retry block allocation for failed DIO and DAX writes
    ext4: add lockdep annotations for i_data_sem
    ext4: allow readdir()'s of large empty directories to be interrupted
    btrfs: fix crash/invalid memory access on fsync when using overlayfs
    ext4 crypto: use dget_parent() in ext4_d_revalidate()
    ext4: use file_dentry()
    ext4: use dget_parent() in ext4_file_open()
    nfs: use file_dentry()
    fs: add file_dentry()
    ext4 crypto: don't let data integrity writebacks fail with ENOMEM
    ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()

    Linus Torvalds
     

07 Apr, 2016

1 commit

  • If we rename an inode A (be it a file or a directory), create a new
    inode B with the old name of inode A and under the same parent directory,
    fsync inode B and then power fail, at log tree replay time we end up
    removing inode A completely. If inode A is a directory then all its files
    are gone too.

    Example scenarios where this happens:
    This is reproducible with the following steps, taken from a couple of
    test cases written for fstests which are going to be submitted upstream
    soon:

    # Scenario 1

    mkfs.btrfs -f /dev/sdc
    mount /dev/sdc /mnt
    mkdir -p /mnt/a/x
    echo "hello" > /mnt/a/x/foo
    echo "world" > /mnt/a/x/bar
    sync
    mv /mnt/a/x /mnt/a/y
    mkdir /mnt/a/x
    xfs_io -c fsync /mnt/a/x

    The next time the fs is mounted, log tree replay happens and
    the directory "y" does not exist nor do the files "foo" and
    "bar" exist anywhere (neither in "y" nor in "x", nor the root
    nor anywhere).

    # Scenario 2

    mkfs.btrfs -f /dev/sdc
    mount /dev/sdc /mnt
    mkdir /mnt/a
    echo "hello" > /mnt/a/foo
    sync
    mv /mnt/a/foo /mnt/a/bar
    echo "world" > /mnt/a/foo
    xfs_io -c fsync /mnt/a/foo

    The next time the fs is mounted, log tree replay happens and the
    file "bar" does not exists anymore. A file with the name "foo"
    exists and it matches the second file we created.

    Another related problem that does not involve file/data loss is when a
    new inode is created with the name of a deleted snapshot and we fsync it:

    mkfs.btrfs -f /dev/sdc
    mount /dev/sdc /mnt
    mkdir /mnt/testdir
    btrfs subvolume snapshot /mnt /mnt/testdir/snap
    btrfs subvolume delete /mnt/testdir/snap
    rmdir /mnt/testdir
    mkdir /mnt/testdir
    xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir

    The next time the fs is mounted the log replay procedure fails because
    it attempts to delete the snapshot entry (which has dir item key type
    of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry,
    resulting in the following error that causes mount to fail:

    [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
    [52174.512570] ------------[ cut here ]------------
    [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]()
    [52174.514681] BTRFS: Transaction aborted (error -2)
    [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc
    [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G W 4.5.0-rc6-btrfs-next-27+ #1
    [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
    [52174.524053] 0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758
    [52174.524053] 0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd
    [52174.524053] 00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88
    [52174.524053] Call Trace:
    [52174.524053] [] dump_stack+0x67/0x90
    [52174.524053] [] warn_slowpath_common+0x99/0xb2
    [52174.524053] [] ? __btrfs_unlink_inode+0x178/0x351 [btrfs]
    [52174.524053] [] warn_slowpath_fmt+0x48/0x50
    [52174.524053] [] __btrfs_unlink_inode+0x178/0x351 [btrfs]
    [52174.524053] [] ? iput+0xb0/0x284
    [52174.524053] [] btrfs_unlink_inode+0x1c/0x3d [btrfs]
    [52174.524053] [] check_item_in_log+0x1fe/0x29b [btrfs]
    [52174.524053] [] replay_dir_deletes+0x167/0x1cf [btrfs]
    [52174.524053] [] fixup_inode_link_count+0x289/0x2aa [btrfs]
    [52174.524053] [] fixup_inode_link_counts+0xcb/0x105 [btrfs]
    [52174.524053] [] btrfs_recover_log_trees+0x258/0x32c [btrfs]
    [52174.524053] [] ? replay_one_extent+0x511/0x511 [btrfs]
    [52174.524053] [] open_ctree+0x1dd4/0x21b9 [btrfs]
    [52174.524053] [] btrfs_mount+0x97e/0xaed [btrfs]
    [52174.524053] [] ? trace_hardirqs_on+0xd/0xf
    [52174.524053] [] mount_fs+0x67/0x131
    [52174.524053] [] vfs_kern_mount+0x6c/0xde
    [52174.524053] [] btrfs_mount+0x1ac/0xaed [btrfs]
    [52174.524053] [] ? trace_hardirqs_on+0xd/0xf
    [52174.524053] [] ? lockdep_init_map+0xb9/0x1b3
    [52174.524053] [] mount_fs+0x67/0x131
    [52174.524053] [] vfs_kern_mount+0x6c/0xde
    [52174.524053] [] do_mount+0x8a6/0x9e8
    [52174.524053] [] ? strndup_user+0x3f/0x59
    [52174.524053] [] SyS_mount+0x77/0x9f
    [52174.524053] [] entry_SYSCALL_64_fastpath+0x12/0x6b
    [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]---

    Fix this by forcing a transaction commit when such cases happen.
    This means we check in the commit root of the subvolume tree if there
    was any other inode with the same reference when the inode we are
    fsync'ing is a new inode (created in the current transaction).

    Test cases for fstests, covering all the scenarios given above, were
    submitted upstream for fstests:

    * fstests: generic test for fsync after renaming directory
    https://patchwork.kernel.org/patch/8694281/

    * fstests: generic test for fsync after renaming file
    https://patchwork.kernel.org/patch/8694301/

    * fstests: add btrfs test for fsync after snapshot deletion
    https://patchwork.kernel.org/patch/8670671/

    Cc: stable@vger.kernel.org
    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

05 Apr, 2016

5 commits

  • Pull quota fixes from Jan Kara:
    "Fixes for oopses when the new quotactl gets used with quotas disabled"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas
    quota: Handle Q_GETNEXTQUOTA when quota is disabled

    Linus Torvalds
     
  • Pull f2fs fixes from Jaegeuk Kim.

    * tag 'f2fs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
    f2fs: retrieve IO write stat from the right place
    f2fs crypto: fix corrupted symlink in encrypted case
    f2fs: cover large section in sanity check of super

    Linus Torvalds
     
  • Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
    "PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The first patch with most changes has been done with coccinelle. The
    second is manual fixups on top.

    The third patch removes macros definition"

    [ I was planning to apply this just before rc2, but then I spaced out,
    so here it is right _after_ rc2 instead.

    As Kirill suggested as a possibility, I could have decided to only
    merge the first two patches, and leave the old interfaces for
    compatibility, but I'd rather get it all done and any out-of-tree
    modules and patches can trivially do the converstion while still also
    working with older kernels, so there is little reason to try to
    maintain the redundant legacy model. - Linus ]

    * PAGE_CACHE_SIZE-removal:
    mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
    mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
    mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

    Linus Torvalds
     
  • Mostly direct substitution with occasional adjustment or removing
    outdated comments.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

04 Apr, 2016

9 commits

  • If device replace entry was found on disk at mounting and its num_write_errors
    stats counter has non-NULL value, then replace operation will never be
    finished and -EIO error will be reported by btrfs_scrub_dev() because
    this counter is never reset.

    # mount -o degraded /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
    # btrfs replace status /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
    Started on 25.Mar 07:28:00, canceled on 25.Mar 07:28:01 at 0.0%, 40 write errs, 0 uncorr. read errs
    # btrfs replace start -B 4 /dev/sdg /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
    ERROR: ioctl(DEV_REPLACE_START) failed on "/media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/": Input/output error, no error

    Reset num_write_errors and num_uncorrectable_read_errors counters in the
    dev_replace structure before start of replacing.

    Signed-off-by: Yauhen Kharuzhy
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Yauhen Kharuzhy
     
  • This patch adds tracepoints to the qgroup code on both the reporting side
    (insert_dirty_extents) and the accounting side. Taken together it allows us
    to see what qgroup operations have happened, and what their result was.

    Signed-off-by: Mark Fasheh
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Mark Fasheh
     
  • The fd we pass in may not be on a btrfs file system, so don't try to do
    BTRFS_I() on it. Thanks,

    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Josef Bacik
     
  • The allocation of node could fail if the memory is too fragmented for a
    given node size, practically observed with 64k.

    http://article.gmane.org/gmane.comp.file-systems.btrfs/54689

    Reported-and-tested-by: Jean-Denis Girard
    Signed-off-by: David Sterba

    David Sterba
     
  • create_pending_snapshot() will go readonly on _any_ error return from
    btrfs_qgroup_inherit(). If qgroups are enabled, a user can crash their fs by
    just making a snapshot and asking it to inherit from an invalid qgroup. For
    example:

    $ btrfs sub snap -i 1/10 /btrfs/ /btrfs/foo

    Will cause a transaction abort.

    Fix this by only throwing errors in btrfs_qgroup_inherit() when we know
    going readonly is acceptable.

    The following xfstests test case reproduces this bug:

    seq=`basename $0`
    seqres=$RESULT_DIR/$seq
    echo "QA output created by $seq"

    here=`pwd`
    tmp=/tmp/$$
    status=1 # failure is the default!
    trap "_cleanup; exit \$status" 0 1 2 3 15

    _cleanup()
    {
    cd /
    rm -f $tmp.*
    }

    # get standard environment, filters and checks
    . ./common/rc
    . ./common/filter

    # remove previous $seqres.full before test
    rm -f $seqres.full

    # real QA test starts here
    _supported_fs btrfs
    _supported_os Linux
    _require_scratch

    rm -f $seqres.full

    _scratch_mkfs
    _scratch_mount
    _run_btrfs_util_prog quota enable $SCRATCH_MNT
    # The qgroup '1/10' does not exist and should be silently ignored
    _run_btrfs_util_prog subvolume snapshot -i 1/10 $SCRATCH_MNT $SCRATCH_MNT/snap1

    _scratch_unmount

    echo "Silence is golden"

    status=0
    exit

    Signed-off-by: Mark Fasheh
    Reviewed-by: Qu Wenruo
    Signed-off-by: David Sterba

    Mark Fasheh
     
  • As one user in mail list report reproducible balance ENOSPC error, it's
    better to add more debug info for enospc_debug mount option.

    Reported-by: Marc Haber
    Signed-off-by: Qu Wenruo
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • Dan Carpenter's static checker has found this error, it's introduced by
    commit 64c043de466d
    ("Btrfs: fix up read_tree_block to return proper error")

    It's really supposed to 'break' the loop on error like others.

    Cc: Dan Carpenter
    Reported-by: Dan Carpenter
    Signed-off-by: Liu Bo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Liu Bo
     
  • - We call inode_size_ok() only if FL_KEEP_SIZE isn't specified.
    - As an optimisation we can skip the call if (off + len)
    isn't greater than the current size of the file. This operation
    is called under the lock so the less work we do, the better.
    - If we call inode_size_ok() pass to it the correct value rather
    than a more conservative estimation.

    Signed-off-by: Davide Italiano
    Reviewed-by: Liu Bo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Davide Italiano
     
  • Previously, ext4 would fail the mount if the file system had the quota
    feature enabled and quota mount options (used for the older quota
    setups) were present. This broke xfstests, since xfs silently ignores
    the usrquote and grpquota mount options if they are specified. This
    commit changes things so that we are consistent with xfs; having the
    mount options specified is harmless, so no sense break users by
    forbidding them.

    Cc: stable@vger.kernel.org
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

03 Apr, 2016

1 commit


02 Apr, 2016

3 commits


01 Apr, 2016

4 commits

  • Currently if block allocation for DIO or DAX write fails due to ENOSPC,
    we just returned it to userspace. However these ENOSPC errors can be
    transient because the transaction freeing blocks has not yet committed.
    This demonstrates as failures of generic/102 test when the filesystem is
    mounted with 'dax' mount option.

    Fix the problem by properly retrying the allocation in case of ENOSPC
    error in get blocks functions used for direct IO.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Tested-by: Ross Zwisler

    Jan Kara
     
  • With the internal Quota feature, mke2fs creates empty quota inodes and
    quota usage tracking is enabled as soon as the file system is mounted.
    Since quotacheck is no longer preallocating all of the blocks in the
    quota inode that are likely needed to be written to, we are now seeing
    a lockdep false positive caused by needing to allocate a quota block
    from inside ext4_map_blocks(), while holding i_data_sem for a data
    inode. This results in this complaint:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&ei->i_data_sem);
    lock(&s->s_dquot.dqio_mutex);
    lock(&ei->i_data_sem);
    lock(&s->s_dquot.dqio_mutex);

    Google-Bug-Id: 27907753

    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • Version 2.9.4 isn't even released yet.

    Signed-off-by: Martin Brandenburg

    Martin Brandenburg
     
  • This was quite an oversight. After a readdir, the module could not be
    unloaded, the number of slots is wrong, and memory near the slot bitmap
    is possibly corrupt. Oops.

    Signed-off-by: Martin Brandenburg

    Martin Brandenburg
     

31 Mar, 2016

6 commits

  • Pull vfs fix from Al Viro.

    Automount handling was broken by commit e3c13928086f ("namei: massage
    lookup_slow() to be usable by lookup_one_len_unlocked()") moving the
    test for negative dentry too early.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fix the braino in "namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()"

    Linus Torvalds
     
  • We should try to trigger automount *before* bailing out on negative dentry.

    Reported-by: Jun'ichi Nomura
    Reported-by: Jun'ichi Nomura
    Reported-by: Arend van Spriel
    Tested-by: Arend van Spriel
    Tested-by: Jun'ichi Nomura
    Signed-off-by: Al Viro

    Al Viro
     
  • If a directory has a large number of empty blocks, iterating over all
    of them can take a long time, leading to scheduler warnings and users
    getting irritated when they can't kill a process in the middle of one
    of these long-running readdir operations. Fix this by adding checks to
    ext4_readdir() and ext4_htree_fill_tree().

    Reported-by: Benjamin LaHaise
    Google-Bug-Id: 27880676
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     
  • If the lower or upper directory of an overlayfs mount belong to a btrfs
    file system and we fsync the file through the overlayfs' merged directory
    we ended up accessing an inode that didn't belong to btrfs as if it were
    a btrfs inode at btrfs_sync_file() resulting in a crash like the following:

    [ 7782.588845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000544
    [ 7782.590624] IP: [] btrfs_sync_file+0x11b/0x3e9 [btrfs]
    [ 7782.591931] PGD 4d954067 PUD 1e878067 PMD 0
    [ 7782.592016] Oops: 0002 [#6] PREEMPT SMP DEBUG_PAGEALLOC
    [ 7782.592016] Modules linked in: btrfs overlay ppdev crc32c_generic evdev xor raid6_pq psmouse pcspkr sg serio_raw acpi_cpufreq parport_pc parport tpm_tis i2c_piix4 tpm i2c_core processor button loop autofs4 ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
    [ 7782.592016] CPU: 10 PID: 16437 Comm: xfs_io Tainted: G D 4.5.0-rc6-btrfs-next-26+ #1
    [ 7782.592016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
    [ 7782.592016] task: ffff88001b8d40c0 ti: ffff880137488000 task.ti: ffff880137488000
    [ 7782.592016] RIP: 0010:[] [] btrfs_sync_file+0x11b/0x3e9 [btrfs]
    [ 7782.592016] RSP: 0018:ffff88013748be40 EFLAGS: 00010286
    [ 7782.592016] RAX: 0000000080000000 RBX: ffff880133b30c88 RCX: 0000000000000001
    [ 7782.592016] RDX: 0000000000000001 RSI: ffffffff8148fec0 RDI: 00000000ffffffff
    [ 7782.592016] RBP: ffff88013748bec0 R08: 0000000000000001 R09: 0000000000000000
    [ 7782.624248] R10: ffff88013748be40 R11: 0000000000000246 R12: 0000000000000000
    [ 7782.624248] R13: 0000000000000000 R14: 00000000009305a0 R15: ffff880015e3be40
    [ 7782.624248] FS: 00007fa83b9cb700(0000) GS:ffff88023ed40000(0000) knlGS:0000000000000000
    [ 7782.624248] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 7782.624248] CR2: 0000000000000544 CR3: 00000001fa652000 CR4: 00000000000006e0
    [ 7782.624248] Stack:
    [ 7782.624248] ffffffff8108b5cc ffff88013748bec0 0000000000000246 ffff8800b005ded0
    [ 7782.624248] ffff880133b30d60 8000000000000000 7fffffffffffffff 0000000000000246
    [ 7782.624248] 0000000000000246 ffffffff81074f9b ffffffff8104357c ffff880015e3be40
    [ 7782.624248] Call Trace:
    [ 7782.624248] [] ? arch_local_irq_save+0x9/0xc
    [ 7782.624248] [] ? ___might_sleep+0xce/0x217
    [ 7782.624248] [] ? __do_page_fault+0x3c0/0x43a
    [ 7782.624248] [] vfs_fsync_range+0x8c/0x9e
    [ 7782.624248] [] vfs_fsync+0x1c/0x1e
    [ 7782.624248] [] do_fsync+0x31/0x4a
    [ 7782.624248] [] SyS_fsync+0x10/0x14
    [ 7782.624248] [] entry_SYSCALL_64_fastpath+0x12/0x6b
    [ 7782.624248] Code: 85 c0 0f 85 e2 02 00 00 48 8b 45 b0 31 f6 4c 29 e8 48 ff c0 48 89 45 a8 48 8d 83 d8 00 00 00 48 89 c7 48 89 45 a0 e8 fc 43 18 e1 41 ff 84 24 44 05 00 00 48 8b 83 58 ff ff ff 48 c1 e8 07 83
    [ 7782.624248] RIP [] btrfs_sync_file+0x11b/0x3e9 [btrfs]
    [ 7782.624248] RSP
    [ 7782.624248] CR2: 0000000000000544
    [ 7782.661994] ---[ end trace 721e14960eb939bc ]---

    This started happening since commit 4bacc9c9234 (overlayfs: Make f_path
    always point to the overlay and f_inode to the underlay) and even though
    after this change we could still access the btrfs inode through
    struct file->f_mapping->host or struct file->f_inode, we would end up
    resulting in more similar issues later on at check_parent_dirs_for_sync()
    because the dentry we got (from struct file->f_path.dentry) was from
    overlayfs and not from btrfs, that is, we had no way of getting the dentry
    that belonged to btrfs (we always got the dentry that belonged to
    overlayfs).

    The new patch from Miklos Szeredi, titled "vfs: add file_dentry()" and
    recently submitted to linux-fsdevel, adds a file_dentry() API that allows
    us to get the btrfs dentry from the input file and therefore being able
    to fsync when the upper and lower directories belong to btrfs filesystems.

    This issue has been reported several times by users in the mailing list
    and bugzilla. A test case for xfstests is being submitted as well.

    Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101951
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109791
    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason
    Cc: stable@vger.kernel.org

    Filipe Manana
     
  • In the following patch,

    f2fs: split journal cache from curseg cache

    journal cache is split from curseg cache. So IO write statistics should be
    retrived from journal cache but not curseg->sum_blk. Otherwise, it will
    get 0, and the stat is lost.

    Signed-off-by: Shuoran Liu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Shuoran Liu
     
  • In the encrypted symlink case, we should check its corrupted symname after
    decrypting it.
    Otherwise, we can report -ENOENT incorrectly, if encrypted symname starts with
    '\0'.

    Cc: stable 4.5+
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim