17 Apr, 2019

4 commits

  • commit a89afe58f1a74aac768a5eb77af95ef4ee15beaa upstream.

    If the last bio returned is not dio->bio, the status of the bio will
    not assigned to dio->bio if it is error. This will cause the whole IO
    status wrong.

    ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0]
    -0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0]
    -0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 []
    -0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 []
    ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475]
    ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7
    ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0

    We always have to assign bio->bi_status to dio->bio.bi_status because we
    will only check dio->bio.bi_status when we return the whole IO to
    the upper layer.

    Fixes: 542ff7bf18c6 ("block: new direct I/O implementation")
    Cc: stable@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Jens Axboe
    Reviewed-by: Ming Lei
    Signed-off-by: Jason Yan
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jason Yan
     
  • commit 272e5326c7837697882ce3162029ba893059b616 upstream.

    The compression property resets to NULL, instead of the old value if we
    fail to set the new compression parameter.

    $ btrfs prop get /btrfs compression
    compression=lzo
    $ btrfs prop set /btrfs compression zli
    ERROR: failed to set compression for /btrfs: Invalid argument
    $ btrfs prop get /btrfs compression

    This is because the compression property ->validate() is successful for
    'zli' as the strncmp() used the length passed from the userspace.

    Fix it by using the expected string length in strncmp().

    Fixes: 63541927c8d1 ("Btrfs: add support for inode properties")
    Fixes: 5c1aab1dd544 ("btrfs: Add zstd support")
    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Nikolay Borisov
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     
  • commit 50398fde997f6be8faebdb5f38e9c9c467370f51 upstream.

    We let pass zstd compression parameter even if it is not fully valid.
    For example:

    $ btrfs prop set /btrfs compression zst
    $ btrfs prop get /btrfs compression
    compression=zst

    zlib and lzo are fine.

    Fix it by checking the correct prefix length.

    Fixes: 5c1aab1dd544 ("btrfs: Add zstd support")
    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Nikolay Borisov
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     
  • commit f35f06c35560a86e841631f0243b83a984dc11a9 upstream.

    Whan a filesystem is mounted with the nologreplay mount option, which
    requires it to be mounted in RO mode as well, we can not allow discard on
    free space inside block groups, because log trees refer to extents that
    are not pinned in a block group's free space cache (pinning the extents is
    precisely the first phase of replaying a log tree).

    So do not allow the fitrim ioctl to do anything when the filesystem is
    mounted with the nologreplay option, because later it can be mounted RW
    without that option, which causes log replay to happen and result in
    either a failure to replay the log trees (leading to a mount failure), a
    crash or some silent corruption.

    Reported-by: Darrick J. Wong
    Fixes: 96da09192cda ("btrfs: Introduce new mount option to disable tree log replay")
    CC: stable@vger.kernel.org # 4.9+
    Reviewed-by: Nikolay Borisov
    Signed-off-by: Filipe Manana
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     

06 Apr, 2019

17 commits

  • [ Upstream commit ac92985864e187a1735502f6a02f54eaa655b2aa ]

    When setting /sys/fs/f2fs//iostat_enable with non-bool value, UBSAN
    reports the following warning.

    [ 7562.295484] ================================================================================
    [ 7562.296531] UBSAN: Undefined behaviour in fs/f2fs/f2fs.h:2776:10
    [ 7562.297651] load of value 64 is not a valid value for type '_Bool'
    [ 7562.298642] CPU: 1 PID: 7487 Comm: dd Not tainted 4.20.0-rc4+ #79
    [ 7562.298653] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 7562.298662] Call Trace:
    [ 7562.298760] dump_stack+0x46/0x5b
    [ 7562.298811] ubsan_epilogue+0x9/0x40
    [ 7562.298830] __ubsan_handle_load_invalid_value+0x72/0x90
    [ 7562.298863] f2fs_file_write_iter+0x29f/0x3f0
    [ 7562.298905] __vfs_write+0x115/0x160
    [ 7562.298922] vfs_write+0xa7/0x190
    [ 7562.298934] ksys_write+0x50/0xc0
    [ 7562.298973] do_syscall_64+0x4a/0xe0
    [ 7562.298992] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 7562.299001] RIP: 0033:0x7fa45ec19c00
    [ 7562.299004] Code: 73 01 c3 48 8b 0d 88 92 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d dd eb 2c 00 00 75 10 b8 01 00 00 00 0f 05 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce 8f 01 00 48 89 04 24
    [ 7562.299044] RSP: 002b:00007ffca52b49e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    [ 7562.299052] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa45ec19c00
    [ 7562.299059] RDX: 0000000000000400 RSI: 000000000093f000 RDI: 0000000000000001
    [ 7562.299065] RBP: 000000000093f000 R08: 0000000000000004 R09: 0000000000000000
    [ 7562.299071] R10: 00007ffca52b47b0 R11: 0000000000000246 R12: 0000000000000400
    [ 7562.299077] R13: 000000000093f000 R14: 000000000093f400 R15: 0000000000000000
    [ 7562.299091] ================================================================================

    So, if iostat_enable is enabled, set its value as true.

    Signed-off-by: Sheng Yong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Sheng Yong
     
  • [ Upstream commit 538bcaa6261b77e71d37f5596c33127c1a3ec3f7 ]

    The jbd2 superblock is lockless now, so there is probably a race
    condition between writing it so disk and modifing contents of it, which
    may lead to checksum error. The following race is the one case that we
    have captured.

    jbd2 fsstress
    jbd2_journal_commit_transaction
    jbd2_journal_update_sb_log_tail
    jbd2_write_superblock
    jbd2_superblock_csum_set jbd2_journal_revoke
    jbd2_journal_set_features(revork)
    modify superblock
    submit_bh(checksum incorrect)

    Fix this by locking the buffer head before modifing it. We always
    write the jbd2 superblock after we modify it, so this just means
    calling the lock_buffer() a little earlier.

    This checksum corruption problem can be reproduced by xfstests
    generic/475.

    Reported-by: zhangyi (F)
    Suggested-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    Theodore Ts'o
     
  • [ Upstream commit cc4b1242d7e3b42eed73881fc749944146493e4f ]

    The preadv2 and pwritev2 syscalls are supposed to emulate the readv and
    writev syscalls when offset == -1. Therefore the compat code should
    check for offset before calling do_compat_preadv64 and
    do_compat_pwritev64. This is the case for the preadv2 and pwritev2
    syscalls, but handling of offset == -1 is missing in their 64-bit
    equivalent.

    This patch fixes that, calling do_compat_readv and do_compat_writev when
    offset == -1. This fixes the following glibc tests on x32:
    - misc/tst-preadvwritev2
    - misc/tst-preadvwritev64v2

    Cc: Alexander Viro
    Cc: H.J. Lu
    Signed-off-by: Aurelien Jarno
    Signed-off-by: Al Viro
    Signed-off-by: Sasha Levin

    Aurelien Jarno
     
  • [ Upstream commit f5fef4593653dfa2a865c485bb81415de51d5c99 ]

    [BUG]
    Btrfs qgroup will still hit EDQUOT under the following case:

    $ dev=/dev/test/test
    $ mnt=/mnt/btrfs
    $ umount $mnt &> /dev/null
    $ umount $dev &> /dev/null

    $ mkfs.btrfs -f $dev
    $ mount $dev $mnt -o nospace_cache

    $ btrfs subv create $mnt/subv
    $ btrfs quota enable $mnt
    $ btrfs quota rescan -w $mnt
    $ btrfs qgroup limit -e 1G $mnt/subv

    $ fallocate -l 900M $mnt/subv/padding
    $ sync

    $ rm $mnt/subv/padding

    # Hit EDQUOT
    $ xfs_io -f -c "pwrite 0 512M" $mnt/subv/real_file

    [CAUSE]
    Since commit a514d63882c3 ("btrfs: qgroup: Commit transaction in advance
    to reduce early EDQUOT"), btrfs is not forced to commit transaction to
    reclaim more quota space.

    Instead, we just check pertrans metadata reservation against some
    threshold and try to do asynchronously transaction commit.

    However in above case, the pertrans metadata reservation is pretty small
    thus it will never trigger asynchronous transaction commit.

    [FIX]
    Instead of only accounting pertrans metadata reservation, we calculate
    how much free space we have, and if there isn't much free space left,
    commit transaction asynchronously to try to free some space.

    This may slow down the fs when we have less than 32M free qgroup space,
    but should reduce a lot of false EDQUOT, so the cost should be
    acceptable.

    Signed-off-by: Qu Wenruo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin

    Qu Wenruo
     
  • [ Upstream commit dce30ca9e3b676fb288c33c1f4725a0621361185 ]

    guard_bio_eod() can truncate a segment in bio to allow it to do IO on
    odd last sectors of a device.

    It already checks if the IO starts past EOD, but it does not consider
    the possibility of an IO request starting within device boundaries can
    contain more than one segment past EOD.

    In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
    underflow bvec->bv_len.

    Fix this by checking if truncated_bytes is lower than PAGE_SIZE.

    This situation has been found on filesystems such as isofs and vfat,
    which doesn't check the device size before mount, if the device is
    smaller than the filesystem itself, a readahead on such filesystem,
    which spans EOD, can trigger this situation, leading a call to
    zero_user() with a wrong size possibly corrupting memory.

    I didn't see any crash, or didn't let the system run long enough to
    check if memory corruption will be hit somewhere, but adding
    instrumentation to guard_bio_end() to check truncated_bytes size, was
    enough to see the error.

    The following script can trigger the error.

    MNT=/mnt
    IMG=./DISK.img
    DEV=/dev/loop0

    mkfs.vfat $IMG
    mount $IMG $MNT
    cp -R /etc $MNT &> /dev/null
    umount $MNT

    losetup -D

    losetup --find --show --sizelimit 16247280 $IMG
    mount $DEV $MNT

    find $MNT -type f -exec cat {} + >/dev/null

    Kudos to Eric Sandeen for coming up with the reproducer above

    Reviewed-by: Ming Lei
    Signed-off-by: Carlos Maiolino
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Carlos Maiolino
     
  • [ Upstream commit 6e876c3dd205d30b0db6850e97a03d75457df007 ]

    In jbd2_journal_commit_transaction(), if we are in abort mode,
    we may flush the buffer without setting descriptor block checksum
    by goto start_journal_io. Then fs is mounted,
    jbd2_descriptor_block_csum_verify() failed.

    [ 271.379811] EXT4-fs (vdd): shut down requested (2)
    [ 271.381827] Aborting journal on device vdd-8.
    [ 271.597136] JBD2: Invalid checksum recovering block 22199 in log
    [ 271.598023] JBD2: recovery failed
    [ 271.598484] EXT4-fs (vdd): error loading journal

    Fix this problem by keep setting descriptor block checksum if the
    descriptor buffer is not NULL.

    This checksum problem can be reproduced by xfstests generic/388.

    Signed-off-by: luojiajun
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Signed-off-by: Sasha Levin

    luojiajun
     
  • [ Upstream commit 68e2672f8fbd1e04982b8d2798dd318bf2515dd2 ]

    There is a NULL pointer dereference of devname in strspn()

    The oops looks something like:

    CIFS: Attempting to mount (null)
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    ...
    RIP: 0010:strspn+0x0/0x50
    ...
    Call Trace:
    ? cifs_parse_mount_options+0x222/0x1710 [cifs]
    ? cifs_get_volume_info+0x2f/0x80 [cifs]
    cifs_setup_volume_info+0x20/0x190 [cifs]
    cifs_get_volume_info+0x50/0x80 [cifs]
    cifs_smb3_do_mount+0x59/0x630 [cifs]
    ? ida_alloc_range+0x34b/0x3d0
    cifs_do_mount+0x11/0x20 [cifs]
    mount_fs+0x52/0x170
    vfs_kern_mount+0x6b/0x170
    do_mount+0x216/0xdc0
    ksys_mount+0x83/0xd0
    __x64_sys_mount+0x25/0x30
    do_syscall_64+0x65/0x220
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fix this by adding a NULL check on devname in cifs_parse_devname()

    Signed-off-by: Yao Liu
    Signed-off-by: Steve French
    Signed-off-by: Sasha Levin

    Yao Liu
     
  • [ Upstream commit 969ae8e8d4ee54c99134d3895f2adf96047f5bee ]

    Old windows version or Netapp SMB server will return
    NT_STATUS_NOT_SUPPORTED since they do not allow or implement
    FSCTL_VALIDATE_NEGOTIATE_INFO. The client should accept the response
    provided it's properly signed.

    See
    https://blogs.msdn.microsoft.com/openspecification/2012/06/28/smb3-secure-dialect-negotiation/

    and

    MS-SMB2 validate negotiate response processing:
    https://msdn.microsoft.com/en-us/library/hh880630.aspx

    Samba client had already handled it.
    https://bugzilla.samba.org/attachment.cgi?id=13285&action=edit

    Signed-off-by: Namjae Jeon
    Signed-off-by: Steve French
    Signed-off-by: Sasha Levin

    Namjae Jeon
     
  • [ Upstream commit 500e0b28ecd3c5aade98f3c3a339d18dcb166bb6 ]

    We use below condition to check inline_xattr_size boundary:

    if (!F2FS_OPTION(sbi).inline_xattr_size ||
    F2FS_OPTION(sbi).inline_xattr_size >=
    DEF_ADDRS_PER_INODE -
    F2FS_TOTAL_EXTRA_ATTR_SIZE -
    DEF_INLINE_RESERVED_SIZE -
    DEF_MIN_INLINE_SIZE)

    There is there problems in that check:
    - we should allow inline_xattr_size equaling to min size of inline
    {data,dentry} area.
    - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
    different size unit, previous one is 4 bytes, latter one is 1 bytes.
    - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
    however, we need to consider min size of inline dentry area as well,
    minimal inline dentry should at least contain two entries: '.' and
    '..', so that min inline_dentry size is 40 bytes.

    .bitmap 1 * 1 = 1
    .reserved 1 * 1 = 1
    .dentry 11 * 2 = 22
    .filename 8 * 2 = 16
    total 40

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     
  • [ Upstream commit 259594bea574e515a148171b5cd84ce5cbdc028a ]

    When compiling with -Wformat, clang emits the following warnings:

    fs/cifs/smb1ops.c:312:20: warning: format specifies type 'unsigned
    short' but the argument has type 'unsigned int' [-Wformat]
    tgt_total_cnt, total_in_tgt);
    ^~~~~~~~~~~~

    fs/cifs/cifs_dfs_ref.c:289:4: warning: format specifies type 'short'
    but the argument has type 'int' [-Wformat]
    ref->flags, ref->server_type);
    ^~~~~~~~~~

    fs/cifs/cifs_dfs_ref.c:289:16: warning: format specifies type 'short'
    but the argument has type 'int' [-Wformat]
    ref->flags, ref->server_type);
    ^~~~~~~~~~~~~~~~

    fs/cifs/cifs_dfs_ref.c:291:4: warning: format specifies type 'short'
    but the argument has type 'int' [-Wformat]
    ref->ref_flag, ref->path_consumed);
    ^~~~~~~~~~~~~

    fs/cifs/cifs_dfs_ref.c:291:19: warning: format specifies type 'short'
    but the argument has type 'int' [-Wformat]
    ref->ref_flag, ref->path_consumed);
    ^~~~~~~~~~~~~~~~~~
    The types of these arguments are unconditionally defined, so this patch
    updates the format character to the correct ones for ints and unsigned
    ints.

    Link: https://github.com/ClangBuiltLinux/linux/issues/378

    Signed-off-by: Louis Taylor
    Signed-off-by: Steve French
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Sasha Levin

    Louis Taylor
     
  • [ Upstream commit 5704a06810682683355624923547b41540e2801a ]

    (Taken from https://bugzilla.kernel.org/show_bug.cgi?id=200647)

    'get_unused_fd_flags' in kthread cause kernel crash. It works fine on
    4.1, but causes crash after get 64 fds. It also cause crash on
    ubuntu1404/1604/1804, centos7.5, and the crash messages are almost the
    same.

    The crash message on centos7.5 shows below:

    start fd 61
    start fd 62
    start fd 63
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: __wake_up_common+0x2e/0x90
    PGD 0
    Oops: 0000 [#1] SMP
    Modules linked in: test(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink sunrpc kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ppdev pcspkr virtio_balloon parport_pc parport i2c_piix4 joydev ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_console virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ata_piix serio_raw libata virtio_pci virtio_ring i2c_core
    virtio floppy dm_mirror dm_region_hash dm_log dm_mod
    CPU: 2 PID: 1820 Comm: test_fd Kdump: loaded Tainted: G OE ------------ 3.10.0-862.3.3.el7.x86_64 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
    task: ffff8e92b9431fa0 ti: ffff8e94247a0000 task.ti: ffff8e94247a0000
    RIP: 0010:__wake_up_common+0x2e/0x90
    RSP: 0018:ffff8e94247a2d18 EFLAGS: 00010086
    RAX: 0000000000000000 RBX: ffffffff9d09daa0 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffff9d09daa0
    RBP: ffff8e94247a2d50 R08: 0000000000000000 R09: ffff8e92b95dfda8
    R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d09daa8
    R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
    FS: 0000000000000000(0000) GS:ffff8e9434e80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000017c686000 CR4: 00000000000207e0
    Call Trace:
    __wake_up+0x39/0x50
    expand_files+0x131/0x250
    __alloc_fd+0x47/0x170
    get_unused_fd_flags+0x30/0x40
    test_fd+0x12a/0x1c0 [test]
    kthread+0xd1/0xe0
    ret_from_fork_nospec_begin+0x21/0x21
    Code: 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 49 89 fc 49 83 c4 08 53 48 83 ec 10 48 8b 47 08 89 55 cc 4c 89 45 d0 8b 08 49 39 c4 48 8d 78 e8 4c 8d 69 e8 75 08 eb 3b 4c 89 ef
    RIP __wake_up_common+0x2e/0x90
    RSP
    CR2: 0000000000000000

    This issue exists since CentOS 7.5 3.10.0-862 and CentOS 7.4
    (3.10.0-693.21.1 ) is ok. Root cause: the item 'resize_wait' is not
    initialized before being used.

    Reported-by: Richard Zhang
    Reviewed-by: Andrew Morton
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Shuriyc Chu
     
  • [ Upstream commit 9083977dabf3833298ddcd40dee28687f1e6b483 ]

    Fix below warning coming because of using mutex lock in atomic context.

    BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
    in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
    Preemption disabled at: __radix_tree_preload+0x28/0x130
    Call trace:
    dump_backtrace+0x0/0x2b4
    show_stack+0x20/0x28
    dump_stack+0xa8/0xe0
    ___might_sleep+0x144/0x194
    __might_sleep+0x58/0x8c
    mutex_lock+0x2c/0x48
    f2fs_trace_pid+0x88/0x14c
    f2fs_set_node_page_dirty+0xd0/0x184

    Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
    spin_lock() acquired.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Sahitya Tummala
     
  • [ Upstream commit cc725ef3cb202ef2019a3c67c8913efa05c3cce6 ]

    In the process of creating a node, it will cause NULL pointer
    dereference in kernel if o2cb_ctl failed in the interval (mkdir,
    o2cb_set_node_attribute(node_num)] in function o2cb_add_node.

    The node num is initialized to 0 in function o2nm_node_group_make_item,
    o2nm_node_group_drop_item will mistake the node number 0 for a valid
    node number when we delete the node before the node number is set
    correctly. If the local node number of the current host happens to be
    0, cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
    o2hb_thread still running. The panic stack is generated as follows:

    o2hb_thread
    \-o2hb_do_disk_heartbeat
    \-o2hb_check_own_slot
    |-slot = ®->hr_slots[o2nm_this_node()];
    //o2nm_this_node() return O2NM_INVALID_NODE_NUM

    We need to check whether the node number is set when we delete the node.

    Link: http://lkml.kernel.org/r/133d8045-72cc-863e-8eae-5013f9f6bc51@huawei.com
    Signed-off-by: Jia Guo
    Reviewed-by: Joseph Qi
    Acked-by: Jun Piao
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Jia Guo
     
  • [ Upstream commit aadcef64b22f668c1a107b86d3521d9cac915c24 ]

    As Jiqun Li reported in bugzilla:

    https://bugzilla.kernel.org/show_bug.cgi?id=202883

    sometimes, dead lock when make system call SYS_getdents64 with fsync() is
    called by another process.

    monkey running on android9.0

    1. task 9785 held sbi->cp_rwsem and waiting lock_page()
    2. task 10349 held mm_sem and waiting sbi->cp_rwsem
    3. task 9709 held lock_page() and waiting mm_sem

    so this is a dead lock scenario.

    task stack is show by crash tools as following

    crash_arm64> bt ffffffc03c354080
    PID: 9785 TASK: ffffffc03c354080 CPU: 1 COMMAND: "RxIoScheduler-3"
    >> #7 [ffffffc01b50fac0] __lock_page at ffffff80081b11e8

    crash-arm64> bt 10349
    PID: 10349 TASK: ffffffc018b83080 CPU: 1 COMMAND: "BUGLY_ASYNC_UPL"
    >> #3 [ffffffc01f8cfa40] rwsem_down_read_failed at ffffff8008a93afc
    PC: 00000033 LR: 00000000 SP: 00000000 PSTATE: ffffffffffffffff

    crash-arm64> bt 9709
    PID: 9709 TASK: ffffffc03e7f3080 CPU: 1 COMMAND: "IntentService[A"
    >> #3 [ffffffc001e67850] rwsem_down_read_failed at ffffff8008a93afc
    >> #8 [ffffffc001e67b80] el1_ia at ffffff8008084fc4
    PC: ffffff8008274114 [compat_filldir64+120]
    LR: ffffff80083584d4 [f2fs_fill_dentries+448]
    SP: ffffffc001e67b80 PSTATE: 80400145
    X29: ffffffc001e67b80 X28: 0000000000000000 X27: 000000000000001a
    X26: 00000000000093d7 X25: ffffffc070d52480 X24: 0000000000000008
    X23: 0000000000000028 X22: 00000000d43dfd60 X21: ffffffc001e67e90
    X20: 0000000000000011 X19: ffffff80093a4000 X18: 0000000000000000
    X17: 0000000000000000 X16: 0000000000000000 X15: 0000000000000000
    X14: ffffffffffffffff X13: 0000000000000008 X12: 0101010101010101
    X11: 7f7f7f7f7f7f7f7f X10: 6a6a6a6a6a6a6a6a X9: 7f7f7f7f7f7f7f7f
    X8: 0000000080808000 X7: ffffff800827409c X6: 0000000080808000
    X5: 0000000000000008 X4: 00000000000093d7 X3: 000000000000001a
    X2: 0000000000000011 X1: ffffffc070d52480 X0: 0000000000800238
    >> #9 [ffffffc001e67be0] f2fs_fill_dentries at ffffff80083584d0
    PC: 0000003c LR: 00000000 SP: 00000000 PSTATE: 000000d9
    X12: f48a02ff X11: d4678960 X10: d43dfc00 X9: d4678ae4
    X8: 00000058 X7: d4678994 X6: d43de800 X5: 000000d9
    X4: d43dfc0c X3: d43dfc10 X2: d46799c8 X1: 00000000
    X0: 00001068

    Below potential deadlock will happen between three threads:
    Thread A Thread B Thread C
    - f2fs_do_sync_file
    - f2fs_write_checkpoint
    - down_write(&sbi->node_change) -- 1)
    - do_page_fault
    - down_write(&mm->mmap_sem) -- 2)
    - do_wp_page
    - f2fs_vm_page_mkwrite
    - getdents64
    - f2fs_read_inline_dir
    - lock_page -- 3)
    - f2fs_sync_node_pages
    - lock_page -- 3)
    - __do_map_lock
    - down_read(&sbi->node_change) -- 1)
    - f2fs_fill_dentries
    - dir_emit
    - compat_filldir64
    - do_page_fault
    - down_read(&mm->mmap_sem) -- 2)

    Since f2fs_readdir is protected by inode.i_rwsem, there should not be
    any updates in inode page, we're safe to lookup dents in inode page
    without its lock held, so taking off the lock to improve concurrency
    of readdir and avoid potential deadlock.

    Reported-by: Jiqun Li
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     
  • [ Upstream commit 2c28aba8b2e2a51749fa66e01b68e1cd5b53e022 ]

    With below testcase, we will fail to find existed xattr entry:

    1. mkfs.f2fs -O extra_attr -O flexible_inline_xattr /dev/zram0
    2. mount -t f2fs -o inline_xattr_size=1 /dev/zram0 /mnt/f2fs/
    3. touch /mnt/f2fs/file
    4. setfattr -n "user.name" -v 0 /mnt/f2fs/file
    5. getfattr -n "user.name" /mnt/f2fs/file

    /mnt/f2fs/file: user.name: No such attribute

    The reason is for inode which has very small inline xattr size,
    __find_inline_xattr() will fail to traverse any entry due to first
    entry may not be loaded from xattr node yet, later, we may skip to
    check entire xattr datas in __find_xattr(), result in such wrong
    condition.

    This patch adds condition to check such case to avoid this issue.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     
  • [ Upstream commit bc31d0cdcfbadb6258b45db97e93b1c83822ba33 ]

    We have a customer reporting crashes in lock_get_status() with many
    "Leaked POSIX lock" messages preceeding the crash.

    Leaked POSIX lock on dev=0x0:0x56 ...
    Leaked POSIX lock on dev=0x0:0x56 ...
    Leaked POSIX lock on dev=0x0:0x56 ...
    Leaked POSIX lock on dev=0x0:0x53 ...
    Leaked POSIX lock on dev=0x0:0x53 ...
    Leaked POSIX lock on dev=0x0:0x53 ...
    Leaked POSIX lock on dev=0x0:0x53 ...
    POSIX: fl_owner=ffff8900e7b79380 fl_flags=0x1 fl_type=0x1 fl_pid=20709
    Leaked POSIX lock on dev=0x0:0x4b ino...
    Leaked locks on dev=0x0:0x4b ino=0xf911400000029:
    POSIX: fl_owner=ffff89f41c870e00 fl_flags=0x1 fl_type=0x1 fl_pid=19592
    stack segment: 0000 [#1] SMP
    Modules linked in: binfmt_misc msr tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 arc4 ecb auth_rpcgss nfsv4 md4 nfs nls_utf8 lockd grace cifs sunrpc ccm dns_resolver fscache af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock xfs libcrc32c sb_edac edac_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng vmw_balloon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr vmxnet3 i2c_piix4 vmw_vmci shpchp fjes processor button ac btrfs xor raid6_pq sr_mod cdrom ata_generic sd_mod ata_piix vmwgfx crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw ahci libahci drm libata vmw_pvscsi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4

    Supported: Yes
    CPU: 6 PID: 28250 Comm: lsof Not tainted 4.4.156-94.64-default #1
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
    task: ffff88a345f28740 ti: ffff88c74005c000 task.ti: ffff88c74005c000
    RIP: 0010:[] [] lock_get_status+0x9b/0x3b0
    RSP: 0018:ffff88c74005fd90 EFLAGS: 00010202
    RAX: ffff89bde83e20ae RBX: ffff89e870003d18 RCX: 0000000049534f50
    RDX: ffffffff81a3541f RSI: ffffffff81a3544e RDI: ffff89bde83e20ae
    RBP: 0026252423222120 R08: 0000000020584953 R09: 000000000000ffff
    R10: 0000000000000000 R11: ffff88c74005fc70 R12: ffff89e5ca7b1340
    R13: 00000000000050e5 R14: ffff89e870003d30 R15: ffff89e5ca7b1340
    FS: 00007fafd64be800(0000) GS:ffff89f41fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000001c80018 CR3: 000000a522048000 CR4: 0000000000360670
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Stack:
    0000000000000208 ffffffff81a3d6b6 ffff89e870003d30 ffff89e870003d18
    ffff89e5ca7b1340 ffff89f41738d7c0 ffff89e870003d30 ffff89e5ca7b1340
    ffffffff8125e08f 0000000000000000 ffff89bc22b67d00 ffff88c74005ff28
    Call Trace:
    [] locks_show+0x2f/0x70
    [] seq_read+0x251/0x3a0
    [] proc_reg_read+0x3c/0x70
    [] __vfs_read+0x26/0x140
    [] vfs_read+0x7a/0x120
    [] SyS_read+0x42/0xa0
    [] entry_SYSCALL_64_fastpath+0x1e/0xb7

    When Linux closes a FD (close(), close-on-exec, dup2(), ...) it calls
    filp_close() which also removes all posix locks.

    The lock struct is initialized like so in filp_close() and passed
    down to cifs

    ...
    lock.fl_type = F_UNLCK;
    lock.fl_flags = FL_POSIX | FL_CLOSE;
    lock.fl_start = 0;
    lock.fl_end = OFFSET_MAX;
    ...

    Note the FL_CLOSE flag, which hints the VFS code that this unlocking
    is done for closing the fd.

    filp_close()
    locks_remove_posix(filp, id);
    vfs_lock_file(filp, F_SETLK, &lock, NULL);
    return filp->f_op->lock(filp, cmd, fl) => cifs_lock()
    rc = cifs_setlk(file, flock, type, wait_flag, posix_lck, lock, unlock, xid);
    rc = server->ops->mand_unlock_range(cfile, flock, xid);
    if (flock->fl_flags & FL_POSIX && !rc)
    rc = locks_lock_file_wait(file, flock)

    Notice how we don't call locks_lock_file_wait() which does the
    generic VFS lock/unlock/wait work on the inode if rc != 0.

    If we are closing the handle, the SMB server is supposed to remove any
    locks associated with it. Similarly, cifs.ko frees and wakes up any
    lock and lock waiter when closing the file:

    cifs_close()
    cifsFileInfo_put(file->private_data)
    /*
    * Delete any outstanding lock records. We'll lose them when the file
    * is closed anyway.
    */
    down_write(&cifsi->lock_sem);
    list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) {
    list_del(&li->llist);
    cifs_del_lock_waiters(li);
    kfree(li);
    }
    list_del(&cifs_file->llist->llist);
    kfree(cifs_file->llist);
    up_write(&cifsi->lock_sem);

    So we can safely ignore unlocking failures in cifs_lock() if they
    happen with the FL_CLOSE flag hint set as both the server and the
    client take care of it during the actual closing.

    This is not a proper fix for the unlocking failure but it's safe and
    it seems to prevent the lock leakages and crashes the customer
    experiences.

    Signed-off-by: Aurelien Aptel
    Signed-off-by: NeilBrown
    Signed-off-by: Steve French
    Acked-by: Pavel Shilovsky
    Signed-off-by: Sasha Levin

    Aurelien Aptel
     
  • commit 5e86bdda41534e17621d5a071b294943cae4376e upstream.

    Currently, we are releasing the indirect buffer where we are done with
    it in ext4_ind_remove_space(), so we can see the brelse() and
    BUFFER_TRACE() everywhere. It seems fragile and hard to read, and we
    may probably forget to release the buffer some day. This patch cleans
    up the code by putting of the code which releases the buffers to the
    end of the function.

    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: Jari Ruusu
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     

03 Apr, 2019

11 commits

  • commit 23da9588037ecdd4901db76a5b79a42b529c4ec3 upstream.

    Syzkaller reports:

    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN PTI
    CPU: 1 PID: 5373 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    RIP: 0010:put_links+0x101/0x440 fs/proc/proc_sysctl.c:1599
    Code: 00 0f 85 3a 03 00 00 48 8b 43 38 48 89 44 24 20 48 83 c0 38 48 89 c2 48 89 44 24 28 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 3c 02 00 0f 85 fe 02 00 00 48 8b 74 24 20 48 c7 c7 60 2a 9d 91
    RSP: 0018:ffff8881d828f238 EFLAGS: 00010202
    RAX: dffffc0000000000 RBX: ffff8881e01b1140 RCX: ffffffff8ee98267
    RDX: 0000000000000007 RSI: ffffc90001479000 RDI: ffff8881e01b1178
    RBP: dffffc0000000000 R08: ffffed103ee27259 R09: ffffed103ee27259
    R10: 0000000000000001 R11: ffffed103ee27258 R12: fffffffffffffff4
    R13: 0000000000000006 R14: ffff8881f59838c0 R15: dffffc0000000000
    FS: 00007f072254f700(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fff8b286668 CR3: 00000001f0542002 CR4: 00000000007606e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
    drop_sysctl_table+0x152/0x9f0 fs/proc/proc_sysctl.c:1629
    get_subdir fs/proc/proc_sysctl.c:1022 [inline]
    __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
    br_netfilter_init+0xbc/0x1000 [br_netfilter]
    do_one_initcall+0xfa/0x5ca init/main.c:887
    do_init_module+0x204/0x5f6 kernel/module.c:3460
    load_module+0x66b2/0x8570 kernel/module.c:3808
    __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
    do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x462e99
    Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f072254ec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
    RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
    RBP: 00007f072254ec70 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f072254f6bc
    R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
    Modules linked in: br_netfilter(+) dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb_dw2102 dvb_usb classmate_laptop palmas_regulator cn videobuf2_v4l2 v4l2_common snd_soc_bd28623 mptbase snd_usb_usx2y snd_usbmidi_lib snd_rawmidi wmi libnvdimm lockd sunrpc grace rc_kworld_pc150u rc_core rtc_da9063 sha1_ssse3 i2c_cros_ec_tunnel adxl34x_spi adxl34x nfnetlink lib80211 i5500_temp dvb_as102 dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops udc_core lnbp22 leds_lp3952 hid_roccat_ryos s1d13xxxfb mtd vport_geneve openvswitch nf_conncount nf_nat_ipv6 nsh geneve udp_tunnel ip6_udp_tunnel snd_soc_mt6351 sis_agp phylink snd_soc_adau1761_spi snd_soc_adau1761 snd_soc_adau17x1 snd_soc_core snd_pcm_dmaengine ac97_bus snd_compress snd_soc_adau_utils snd_soc_sigmadsp_regmap snd_soc_sigmadsp raid_class hid_roccat_konepure hid_roccat_common hid_roccat c2port_duramar2150 core mdio_bcm_unimac iptable_security iptable_raw iptable_mangle
    iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim devlink vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev mousedev ide_pci_generic piix aesni_intel aes_x86_64 ide_core crypto_simd atkbd cryptd glue_helper serio_raw ata_generic pata_acpi i2c_piix4 floppy sch_fq_codel ip_tables x_tables ipv6 [last unloaded: lm73]
    Dumping ftrace buffer:
    (ftrace buffer empty)
    ---[ end trace 770020de38961fd0 ]---

    A new dir entry can be created in get_subdir and its 'header->parent' is
    set to NULL. Only after insert_header success, it will be set to 'dir',
    otherwise 'header->parent' is set to NULL and drop_sysctl_table is called.
    However in err handling path of get_subdir, drop_sysctl_table also be
    called on 'new->header' regardless its value of parent pointer. Then
    put_links is called, which triggers NULL-ptr deref when access member of
    header->parent.

    In fact we have multiple error paths which call drop_sysctl_table() there,
    upon failure on insert_links() we also call drop_sysctl_table().And even
    in the successful case on __register_sysctl_table() we still always call
    drop_sysctl_table().This patch fix it.

    Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com
    Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets")
    Signed-off-by: YueHaibing
    Reported-by: Hulk Robot
    Acked-by: Luis Chamberlain
    Cc: Kees Cook
    Cc: Alexey Dobriyan
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Al Viro
    Cc: Eric W. Biederman
    Cc: [3.4+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    YueHaibing
     
  • commit e6a9467ea14bae8691b0f72c500510c42ea8edb8 upstream.

    ocfs2_reflink_inodes_lock() can swap the inode1/inode2 variables so that
    we always grab cluster locks in order of increasing inode number.

    Unfortunately, we forget to swap the inode record buffer head pointers
    when we've done this, which leads to incorrect bookkeepping when we're
    trying to make the two inodes have the same refcount tree.

    This has the effect of causing filesystem shutdowns if you're trying to
    reflink data from inode 100 into inode 97, where inode 100 already has a
    refcount tree attached and inode 97 doesn't. The reflink code decides
    to copy the refcount tree pointer from 100 to 97, but uses inode 97's
    inode record to open the tree root (which it doesn't have) and blows up.
    This issue causes filesystem shutdowns and metadata corruption!

    Link: http://lkml.kernel.org/r/20190312214910.GK20533@magnolia
    Fixes: 29ac8e856cb369 ("ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features")
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • commit 73601ea5b7b18eb234219ae2adf77530f389da79 upstream.

    syzbot is hitting lockdep warning [1] due to trying to open a fifo
    during an execve() operation. But we don't need to open non regular
    files during an execve() operation, for all files which we will need are
    the executable file itself and the interpreter programs like /bin/sh and
    ld-linux.so.2 .

    Since the manpage for execve(2) says that execve() returns EACCES when
    the file or a script interpreter is not a regular file, and the manpage
    for uselib(2) says that uselib() can return EACCES, and we use
    FMODE_EXEC when opening for execve()/uselib(), we can bail out if a non
    regular file is requested with FMODE_EXEC set.

    Since this deadlock followed by khungtaskd warnings is trivially
    reproducible by a local unprivileged user, and syzbot's frequent crash
    due to this deadlock defers finding other bugs, let's workaround this
    deadlock until we get a chance to find a better solution.

    [1] https://syzkaller.appspot.com/bug?id=b5095bfec44ec84213bac54742a82483aad578ce

    Link: http://lkml.kernel.org/r/1552044017-7890-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
    Reported-by: syzbot
    Fixes: 8924feff66f35fe2 ("splice: lift pipe_lock out of splice_to_pipe()")
    Signed-off-by: Tetsuo Handa
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Eric Biggers
    Cc: Dmitry Vyukov
    Cc: [4.9+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Tetsuo Handa
     
  • commit 0cb98abb5bd13b9a636bde603d952d722688b428 upstream.

    Allow the async rpc task for finish and update the open state if needed,
    then free the slot. Otherwise, the async rpc unable to decode the reply.

    Signed-off-by: Olga Kornievskaia
    Fixes: ae55e59da0e4 ("pnfs: Don't release the sequence slot...")
    Cc: stable@vger.kernel.org # v4.18+
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Olga Kornievskaia
     
  • commit 4a9be28c45bf02fa0436808bb6c0baeba30e120e upstream.

    If the last NFSv3 unmount from a given host races with a mount from the
    same host, we can destroy an nlm_host that is still in use.

    Specifically nlmclnt_lookup_host() can increment h_count on
    an nlm_host that nlmclnt_release_host() has just successfully called
    refcount_dec_and_test() on.
    Once nlmclnt_lookup_host() drops the mutex, nlm_destroy_host_lock()
    will be called to destroy the nlmclnt which is now in use again.

    The cause of the problem is that the dec_and_test happens outside the
    locked region. This is easily fixed by using
    refcount_dec_and_mutex_lock().

    Fixes: 8ea6ecc8b075 ("lockd: Create client-side nlm_host cache")
    Cc: stable@vger.kernel.org (v2.6.38+)
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     
  • commit 0ccc3876e4b2a1559a4dbe3126dda4459d38a83b upstream.

    Back in commit a89ca6f24ffe4 ("Btrfs: fix fsync after truncate when
    no_holes feature is enabled") I added an assertion that is triggered when
    an inline extent is found to assert that the length of the (uncompressed)
    data the extent represents is the same as the i_size of the inode, since
    that is true most of the time I couldn't find or didn't remembered about
    any exception at that time. Later on the assertion was expanded twice to
    deal with a case of a compressed inline extent representing a range that
    matches the sector size followed by an expanding truncate, and another
    case where fallocate can update the i_size of the inode without adding
    or updating existing extents (if the fallocate range falls entirely within
    the first block of the file). These two expansion/fixes of the assertion
    were done by commit 7ed586d0a8241 ("Btrfs: fix assertion on fsync of
    regular file when using no-holes feature") and commit 6399fb5a0b69a
    ("Btrfs: fix assertion failure during fsync in no-holes mode").
    These however missed the case where an falloc expands the i_size of an
    inode to exactly the sector size and inline extent exists, for example:

    $ mkfs.btrfs -f -O no-holes /dev/sdc
    $ mount /dev/sdc /mnt

    $ xfs_io -f -c "pwrite -S 0xab 0 1096" /mnt/foobar
    wrote 1096/1096 bytes at offset 0
    1 KiB, 1 ops; 0.0002 sec (4.448 MiB/sec and 4255.3191 ops/sec)

    $ xfs_io -c "falloc 1096 3000" /mnt/foobar
    $ xfs_io -c "fsync" /mnt/foobar
    Segmentation fault

    $ dmesg
    [701253.602385] assertion failed: len == i_size || (len == fs_info->sectorsize && btrfs_file_extent_compression(leaf, extent) != BTRFS_COMPRESS_NONE) || (len < i_size && i_size < fs_info->sectorsize), file: fs/btrfs/tree-log.c, line: 4727
    [701253.602962] ------------[ cut here ]------------
    [701253.603224] kernel BUG at fs/btrfs/ctree.h:3533!
    [701253.603503] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
    [701253.603774] CPU: 2 PID: 7192 Comm: xfs_io Tainted: G W 5.0.0-rc8-btrfs-next-45 #1
    [701253.604054] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
    [701253.604650] RIP: 0010:assfail.constprop.23+0x18/0x1a [btrfs]
    (...)
    [701253.605591] RSP: 0018:ffffbb48c186bc48 EFLAGS: 00010286
    [701253.605914] RAX: 00000000000000de RBX: ffff921d0a7afc08 RCX: 0000000000000000
    [701253.606244] RDX: 0000000000000000 RSI: ffff921d36b16868 RDI: ffff921d36b16868
    [701253.606580] RBP: ffffbb48c186bcf0 R08: 0000000000000000 R09: 0000000000000000
    [701253.606913] R10: 0000000000000003 R11: 0000000000000000 R12: ffff921d05d2de18
    [701253.607247] R13: ffff921d03b54000 R14: 0000000000000448 R15: ffff921d059ecf80
    [701253.607769] FS: 00007f14da906700(0000) GS:ffff921d36b00000(0000) knlGS:0000000000000000
    [701253.608163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [701253.608516] CR2: 000056087ea9f278 CR3: 00000002268e8001 CR4: 00000000003606e0
    [701253.608880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [701253.609250] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [701253.609608] Call Trace:
    [701253.609994] btrfs_log_inode+0xdfb/0xe40 [btrfs]
    [701253.610383] btrfs_log_inode_parent+0x2be/0xa60 [btrfs]
    [701253.610770] ? do_raw_spin_unlock+0x49/0xc0
    [701253.611150] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
    [701253.611537] btrfs_sync_file+0x3b2/0x440 [btrfs]
    [701253.612010] ? do_sysinfo+0xb0/0xf0
    [701253.612552] do_fsync+0x38/0x60
    [701253.612988] __x64_sys_fsync+0x10/0x20
    [701253.613360] do_syscall_64+0x60/0x1b0
    [701253.613733] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [701253.614103] RIP: 0033:0x7f14da4e66d0
    (...)
    [701253.615250] RSP: 002b:00007fffa670fdb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
    [701253.615647] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f14da4e66d0
    [701253.616047] RDX: 000056087ea9c260 RSI: 000056087ea9c260 RDI: 0000000000000003
    [701253.616450] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000010
    [701253.616854] R10: 000000000000009b R11: 0000000000000246 R12: 000056087ea9c260
    [701253.617257] R13: 000056087ea9c240 R14: 0000000000000000 R15: 000056087ea9dd10
    (...)
    [701253.619941] ---[ end trace e088d74f132b6da5 ]---

    Updating the assertion again to allow for this particular case would result
    in a meaningless assertion, plus there is currently no risk of logging
    content that would result in any corruption after a log replay if the size
    of the data encoded in an inline extent is greater than the inode's i_size
    (which is not currently possibe either with or without compression),
    therefore just remove the assertion.

    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 139a56170de67101791d6e6c8e940c6328393fe9 upstream.

    qgroup_rsv_size is calculated as the product of
    outstanding_extent * fs_info->nodesize. The product is calculated with
    32 bit precision since both variables are defined as u32. Yet
    qgroup_rsv_size expects a 64 bit result.

    Avoid possible multiplication overflow by casting outstanding_extent to
    u64. Such overflow would in the worst case (64K nodesize) require more
    than 65536 extents, which is quite large and i'ts not likely that it
    would happen in practice.

    Fixes-coverity-id: 1435101
    Fixes: ff6bc37eb7f6 ("btrfs: qgroup: Use independent and accurate per inode qgroup rsv")
    CC: stable@vger.kernel.org # 4.19+
    Reviewed-by: Qu Wenruo
    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Borisov
     
  • commit 3897b6f0a859288c22fb793fad11ec2327e60fcd upstream.

    Parity page is incorrectly unmapped in finish_parity_scrub(), triggering
    a reference counter bug on i386, i.e.:

    [ 157.662401] kernel BUG at mm/highmem.c:349!
    [ 157.666725] invalid opcode: 0000 [#1] SMP PTI

    The reason is that kunmap(p_page) was completely left out, so we never
    did an unmap for the p_page and the loop unmapping the rbio page was
    iterating over the wrong number of stripes: unmapping should be done
    with nr_data instead of rbio->real_stripes.

    Test case to reproduce the bug:

    - create a raid5 btrfs filesystem:
    # mkfs.btrfs -m raid5 -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde

    - mount it:
    # mount /dev/sdb /mnt

    - run btrfs scrub in a loop:
    # while :; do btrfs scrub start -BR /mnt; done

    BugLink: https://bugs.launchpad.net/bugs/1812845
    Fixes: 5a6ac9eacb49 ("Btrfs, raid56: support parity scrub on raid56")
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Andrea Righi
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Andrea Righi
     
  • commit 0cc068e6ee59c1fffbfa977d8bf868b7551d80ac upstream.

    As readahead is an optimization, all errors are usually filtered out,
    but still properly handled when the real read call is done. The commit
    5e9d398240b2 ("btrfs: readpages() should submit IO as read-ahead") added
    REQ_RAHEAD to readpages() because that's only used for readahead
    (despite what one would expect from the callback name).

    This causes a flood of messages and inflated read error stats, so skip
    reporting in case it's readahead.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202403
    Reported-by: LimeTech
    Fixes: 5e9d398240b2 ("btrfs: readpages() should submit IO as read-ahead")
    CC: stable@vger.kernel.org # 4.19+
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    David Sterba
     
  • commit 2cc8334270e281815c3850c3adea363c51f21e0d upstream.

    When Filipe added the recursive directory logging stuff in
    2f2ff0ee5e430 ("Btrfs: fix metadata inconsistencies after directory
    fsync") he specifically didn't take the directory i_mutex for the
    children directories that we need to log because of lockdep. This is
    generally fine, but can lead to this WARN_ON() tripping if we happen to
    run delayed deletion's in between our first search and our second search
    of dir_item/dir_indexes for this directory. We expect this to happen,
    so the WARN_ON() isn't necessary. Drop the WARN_ON() and add a comment
    so we know why this case can happen.

    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: Filipe Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit bf504110bc8aa05df48b0e5f0aa84bfb81e0574b upstream.

    If we do a shrinking truncate against an inode which is already present
    in the respective log tree and then rename it, as part of logging the new
    name we end up logging an inode item that reflects the old size of the
    file (the one which we previously logged) and not the new smaller size.
    The decision to preserve the size previously logged was added by commit
    1a4bcf470c886b ("Btrfs: fix fsync data loss after adding hard link to
    inode") in order to avoid data loss after replaying the log. However that
    decision is only needed for the case the logged inode size is smaller then
    the current size of the inode, as explained in that commit's change log.
    If the current size of the inode is smaller then the previously logged
    size, we know a shrinking truncate happened and therefore need to use
    that smaller size.

    Example to trigger the problem:

    $ mkfs.btrfs -f /dev/sdb
    $ mount /dev/sdb /mnt

    $ xfs_io -f -c "pwrite -S 0xab 0 8000" /mnt/foo
    $ xfs_io -c "fsync" /mnt/foo
    $ xfs_io -c "truncate 3000" /mnt/foo

    $ mv /mnt/foo /mnt/bar
    $ xfs_io -c "fsync" /mnt/bar

    $ mount /dev/sdb /mnt
    $ od -t x1 -A d /mnt/bar
    0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
    *
    0008000

    Once we rename the file, we log its name (and inode item), and because
    the inode was already logged before in the current transaction, we log it
    with a size of 8000 bytes because that is the size we previously logged
    (with the first fsync). As part of the rename, besides logging the inode,
    we do also sync the log, which is done since commit d4682ba03ef618
    ("Btrfs: sync log after logging new name"), so the next fsync against our
    inode is effectively a no-op, since no new changes happened since the
    rename operation. Even if did not sync the log during the rename
    operation, the same problem (fize size of 8000 bytes instead of 3000
    bytes) would be visible after replaying the log if the log ended up
    getting synced to disk through some other means, such as for example by
    fsyncing some other modified file. In the example above the fsync after
    the rename operation is there just because not every filesystem may
    guarantee logging/journalling the inode (and syncing the log/journal)
    during the rename operation, for example it is needed for f2fs, but not
    for ext4 and xfs.

    Fix this scenario by, when logging a new name (which is triggered by
    rename and link operations), using the current size of the inode instead
    of the previously logged inode size.

    A test case for fstests follows soon.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202695
    CC: stable@vger.kernel.org # 4.4+
    Reported-by: Seulbae Kim
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     

27 Mar, 2019

7 commits

  • commit 48432984d718c95cf13e26d487c2d1b697c3c01f upstream.

    Thread A Thread B
    - __fput
    - f2fs_release_file
    - drop_inmem_pages
    - mutex_lock(&fi->inmem_lock)
    - __revoke_inmem_pages
    - lock_page(page)
    - open
    - f2fs_setattr
    - truncate_setsize
    - truncate_inode_pages_range
    - lock_page(page)
    - truncate_cleanup_page
    - f2fs_invalidate_page
    - drop_inmem_page
    - mutex_lock(&fi->inmem_lock);

    We may encounter above ABBA deadlock as reported by Kyungtae Kim:

    I'm reporting a bug in linux-4.17.19: "INFO: task hung in
    drop_inmem_page" (no reproducer)

    I think this might be somehow related to the following:
    https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ

    =========================================
    INFO: task syz-executor7:10822 blocked for more than 120 seconds.
    Not tainted 4.17.19 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor7 D27024 10822 6346 0x00000004
    Call Trace:
    context_switch kernel/sched/core.c:2867 [inline]
    __schedule+0x721/0x1e60 kernel/sched/core.c:3515
    schedule+0x88/0x1c0 kernel/sched/core.c:3559
    schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
    __mutex_lock_common kernel/locking/mutex.c:833 [inline]
    __mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
    mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
    drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
    f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
    do_invalidatepage mm/truncate.c:165 [inline]
    truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
    truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
    truncate_inode_pages mm/truncate.c:478 [inline]
    truncate_pagecache+0x6d/0x90 mm/truncate.c:801
    truncate_setsize+0x81/0xa0 mm/truncate.c:826
    f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
    notify_change+0xa62/0xe80 fs/attr.c:313
    do_truncate+0x12e/0x1e0 fs/open.c:63
    do_last fs/namei.c:2955 [inline]
    path_openat+0x2042/0x29f0 fs/namei.c:3505
    do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
    do_sys_open+0x35e/0x4e0 fs/open.c:1101
    __do_sys_open fs/open.c:1119 [inline]
    __se_sys_open fs/open.c:1114 [inline]
    __x64_sys_open+0x89/0xc0 fs/open.c:1114
    do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4497b9
    RSP: 002b:00007f734e459c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
    RAX: ffffffffffffffda RBX: 00007f734e45a6cc RCX: 00000000004497b9
    RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
    RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e45a700
    INFO: task syz-executor7:10858 blocked for more than 120 seconds.
    Not tainted 4.17.19 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor7 D28880 10858 6346 0x00000004
    Call Trace:
    context_switch kernel/sched/core.c:2867 [inline]
    __schedule+0x721/0x1e60 kernel/sched/core.c:3515
    schedule+0x88/0x1c0 kernel/sched/core.c:3559
    __rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
    rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
    call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
    __down_write arch/x86/include/asm/rwsem.h:142 [inline]
    down_write+0x58/0xa0 kernel/locking/rwsem.c:72
    inode_lock include/linux/fs.h:713 [inline]
    do_truncate+0x120/0x1e0 fs/open.c:61
    do_last fs/namei.c:2955 [inline]
    path_openat+0x2042/0x29f0 fs/namei.c:3505
    do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
    do_sys_open+0x35e/0x4e0 fs/open.c:1101
    __do_sys_open fs/open.c:1119 [inline]
    __se_sys_open fs/open.c:1114 [inline]
    __x64_sys_open+0x89/0xc0 fs/open.c:1114
    do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4497b9
    RSP: 002b:00007f734e3b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
    RAX: ffffffffffffffda RBX: 00007f734e3b56cc RCX: 00000000004497b9
    RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
    RBP: 000000000071c238 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e3b5700
    INFO: task syz-executor5:10829 blocked for more than 120 seconds.
    Not tainted 4.17.19 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor5 D28760 10829 6308 0x80000002
    Call Trace:
    context_switch kernel/sched/core.c:2867 [inline]
    __schedule+0x721/0x1e60 kernel/sched/core.c:3515
    schedule+0x88/0x1c0 kernel/sched/core.c:3559
    io_schedule+0x21/0x80 kernel/sched/core.c:5179
    wait_on_page_bit_common mm/filemap.c:1100 [inline]
    __lock_page+0x2b5/0x390 mm/filemap.c:1273
    lock_page include/linux/pagemap.h:483 [inline]
    __revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
    drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
    f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
    __fput+0x2c7/0x780 fs/file_table.c:209
    ____fput+0x1a/0x20 fs/file_table.c:243
    task_work_run+0x151/0x1d0 kernel/task_work.c:113
    exit_task_work include/linux/task_work.h:22 [inline]
    do_exit+0x8ba/0x30a0 kernel/exit.c:865
    do_group_exit+0x13b/0x3a0 kernel/exit.c:968
    get_signal+0x6bb/0x1650 kernel/signal.c:2482
    do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
    exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
    prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
    syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
    do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4497b9
    RSP: 002b:00007f1c68e74ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
    RAX: fffffffffffffe00 RBX: 000000000071bf80 RCX: 00000000004497b9
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000071bf80
    RBP: 000000000071bf80 R08: 0000000000000000 R09: 000000000071bf58
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    R13: 0000000000000000 R14: 00007f1c68e759c0 R15: 00007f1c68e75700

    This patch tries to use trylock_page to mitigate such deadlock condition
    for fix.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     
  • commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217 upstream.

    All indirect buffers get by ext4_find_shared() should be released no
    mater the branch should be freed or not. But now, we forget to release
    the lower depth indirect buffers when removing space from the same
    higher depth indirect block. It will lead to buffer leak and futher
    more, it may lead to quota information corruption when using old quota,
    consider the following case.

    - Create and mount an empty ext4 filesystem without extent and quota
    features,
    - quotacheck and enable the user & group quota,
    - Create some files and write some data to them, and then punch hole
    to some files of them, it may trigger the buffer leak problem
    mentioned above.
    - Disable quota and run quotacheck again, it will create two new
    aquota files and write the checked quota information to them, which
    probably may reuse the freed indirect block(the buffer and page
    cache was not freed) as data block.
    - Enable quota again, it will invoke
    vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
    buffers and pagecache. Unfortunately, because of the buffer of quota
    data block is still referenced, quota code cannot read the up to date
    quota info from the device and lead to quota information corruption.

    This problem can be reproduced by xfstests generic/231 on ext3 file
    system or ext4 file system without extent and quota features.

    This patch fix this problem by releasing the missing indirect buffers,
    in ext4_ind_remove_space().

    Reported-by: Hulk Robot
    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     
  • commit 372a03e01853f860560eade508794dd274e9b390 upstream.

    Ext4 needs to serialize unaligned direct AIO because the zeroing of
    partial blocks of two competing unaligned AIOs can result in data
    corruption.

    However it decides not to serialize if the potentially unaligned aio is
    past i_size with the rationale that no pending writes are possible past
    i_size. Unfortunately if the i_size is not block aligned and the second
    unaligned write lands past i_size, but still into the same block, it has
    the potential of corrupting the previous unaligned write to the same
    block.

    This is (very simplified) reproducer from Frank

    // 41472 = (10 * 4096) + 512
    // 37376 = 41472 - 4096

    ftruncate(fd, 41472);
    io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
    io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);

    io_submit(io_ctx, 1, &iocbs[1]);
    io_submit(io_ctx, 1, &iocbs[2]);

    io_getevents(io_ctx, 2, 2, events, NULL);

    Without this patch the 512B range from 40960 up to the start of the
    second unaligned write (41472) is going to be zeroed overwriting the data
    written by the first write. This is a data corruption.

    00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    *
    00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
    *
    0000a000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    *
    0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31

    With this patch the data corruption is avoided because we will recognize
    the unaligned_aio and wait for the unwritten extent conversion.

    00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    *
    00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
    *
    0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
    *
    0000b200

    Reported-by: Frank Sorenson
    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO")
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Lukas Czerner
     
  • commit fa30dde38aa8628c73a6dded7cb0bba38c27b576 upstream.

    We see the following NULL pointer dereference while running xfstests
    generic/475:
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
    Oops: 0000 [#1] SMP PTI
    CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
    RIP: 0010:ext4_do_update_inode+0x4ec/0x760
    ...
    Call Trace:
    ? jbd2_journal_get_write_access+0x42/0x50
    ? __ext4_journal_get_write_access+0x2c/0x70
    ? ext4_truncate+0x186/0x3f0
    ext4_mark_iloc_dirty+0x61/0x80
    ext4_mark_inode_dirty+0x62/0x1b0
    ext4_truncate+0x186/0x3f0
    ? unmap_mapping_pages+0x56/0x100
    ext4_setattr+0x817/0x8b0
    notify_change+0x1df/0x430
    do_truncate+0x5e/0x90
    ? generic_permission+0x12b/0x1a0

    This is triggered because the NULL pointer handle->h_transaction was
    dereferenced in function ext4_update_inode_fsync_trans().
    I found that the h_transaction was set to NULL in jbd2__journal_restart
    but failed to attached to a new transaction while the journal is aborted.

    Fix this by checking the handle before updating the inode.

    Fixes: b436b9bef84d ("ext4: Wait for proper transaction commit on fsync")
    Signed-off-by: Jiufei Xue
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Joseph Qi
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Jiufei Xue
     
  • commit 8c11a607d1d9cd6e7f01fd6b03923597fb0ef95a upstream.

    Workaround problem with Samba responses to SMB3.1.1
    null user (guest) mounts. The server doesn't set the
    expected flag in the session setup response so we have
    to do a similar check to what is done in smb3_validate_negotiate
    where we also check if the user is a null user (but not sec=krb5
    since username might not be passed in on mount for Kerberos case).

    Note that the commit below tightened the conditions and forced signing
    for the SMB2-TreeConnect commands as per MS-SMB2.
    However, this should only apply to normal user sessions and not for
    cases where there is no user (even if server forgets to set the flag
    in the response) since we don't have anything useful to sign with.
    This is especially important now that the more secure SMB3.1.1 protocol
    is in the default dialect list.

    An earlier patch ("cifs: allow guest mounts to work for smb3.11") fixed
    the guest mounts to Windows.

    Fixes: 6188f28bf608 ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares")

    Reviewed-by: Ronnie Sahlberg
    Reviewed-by: Paulo Alcantara
    CC: Stable
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Steve French
     
  • commit e71ab2aa06f731a944993120b0eef1556c63b81c upstream.

    Fix Guest/Anonymous sessions so that they work with SMB 3.11.

    The commit noted below tightened the conditions and forced signing for
    the SMB2-TreeConnect commands as per MS-SMB2.
    However, this should only apply to normal user sessions and not for
    Guest/Anonumous sessions.

    Fixes: 6188f28bf608 ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares")

    Signed-off-by: Ronnie Sahlberg
    CC: Stable
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Ronnie Sahlberg
     
  • commit d3ca4651d05c0ff7259d087d8c949bcf3e14fb46 upstream.

    When truncate(2) hits IO error when reading indirect extent block the
    code just bugs with:

    kernel BUG at linux-4.15.0/fs/udf/truncate.c:249!
    ...

    Fix the problem by bailing out cleanly in case of IO error.

    CC: stable@vger.kernel.org
    Reported-by: jean-luc malet
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

24 Mar, 2019

1 commit