12 Feb, 2019

2 commits

  • That we we can also poll non blk-mq queues. Mostly needed for
    the NVMe multipath code, but could also be useful elsewhere.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe
    (cherry picked from commit ea435e1b9392a33deceaea2a16ebaa3397bead93)

    Christoph Hellwig
     
  • Fix bug of commit 74d46992e0d9 ("block: replace bi_bdev with a gendisk
    pointer and partitions index").

    bio_dev(bio) is used to find the dev state in function
    __btrfsic_submit_bio. But when dev_state is added to the hashtable, it
    is using dev_t of block_device.

    bio_dev(bio) returns a dev_t of part0 which is different from dev_t in
    block_device(bd_dev). bd_dev in block_device represents the exact
    partition.

    block_device.bd_dev =
    bio->bi_partno (same as block_device.bd_partno) + bio_dev(bio).

    When adding a dev_state into hashtable, we use the exact partition dev_t.
    So when looking it up, it should also use the exact partition dev_t.

    Reproducer of this bug:

    Use MOUNT_OPTIONS="-o check_int" and run btrfs/001 in fstests.
    Then there will be WARNING like below.

    WARNING:
    btrfs: attempt to write superblock which references block M @29523968 (sda7 /1111654400/2) which is never written!

    Signed-off-by: Gu JinXiang
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    (cherry picked from commit d28e649a5c58b779b303c252c66ee84a0f2c3b32)

    Gu JinXiang
     

07 Feb, 2019

6 commits

  • commit b469e7e47c8a075cc08bcd1e85d4365134bdcdd5 upstream.

    When an event is reported on a sub-directory and the parent inode has
    a mark mask with FS_EVENT_ON_CHILD|FS_ISDIR, the event will be sent to
    fsnotify() even if the event type is not in the parent mark mask
    (e.g. FS_OPEN).

    Further more, if that event happened on a mount or a filesystem with
    a mount/sb mark that does have that event type in their mask, the "on
    child" event will be reported on the mount/sb mark. That is not
    desired, because user will get a duplicate event for the same action.

    Note that the event reported on the victim inode is never merged with
    the event reported on the parent inode, because of the check in
    should_merge(): old_fsn->inode == new_fsn->inode.

    Fix this by looking for a match of an actual event type (i.e. not just
    FS_ISDIR) in parent's inode mark mask and by not reporting an "on child"
    event to group if event type is only found on mount/sb marks.

    [backport hint: The bug seems to have always been in fanotify, but this
    patch will only apply cleanly to v4.19.y]

    Cc: # v4.19
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara
    [amir: backport to v4.9]
    Signed-off-by: Amir Goldstein
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • commit 28eb24ff75c5ac130eb326b3b4d0dcecfc0f427d upstream.

    In case a hostname resolves to a different IP address (e.g. long
    running mounts), make sure to resolve it every time prior to calling
    generic_ip_connect() in reconnect.

    Suggested-by: Steve French
    Signed-off-by: Paulo Alcantara
    Signed-off-by: Steve French
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Greg Kroah-Hartman

    Paulo Alcantara
     
  • commit e74c98ca2d6ae4376cc15fa2a22483430909d96b upstream.

    This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.

    It turns out that the fix can lead to a ~20 percent performance regression
    in initial writes to the page cache according to iozone. Let's revert this
    for now to have more time for a proper fix.

    Cc: stable@vger.kernel.org # v3.13+
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     
  • commit 8fc75bed96bb94e23ca51bd9be4daf65c57697bf upstream.

    Ensure that we return the fatal error value that caused us to exit
    nfs_page_async_flush().

    Fixes: c373fff7bd25 ("NFSv4: Don't special case "launder"")
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org # v4.12+
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 1dbd449c9943e3145148cc893c2461b72ba6fef0 upstream.

    The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
    lists and the shrink lists where the DCACHE_LRU_LIST bit is set.

    The shrink_dcache_sb() function moves dentries from the LRU list to a
    shrink list and subtracts the dentry count from nr_dentry_unused. This
    is incorrect as the nr_dentry_unused count will also be decremented in
    shrink_dentry_list() via d_shrink_del().

    To fix this double decrement, the decrement in the shrink_dcache_sb()
    function is taken out.

    Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all."
    Cc: stable@kernel.org
    Signed-off-by: Waiman Long
    Reviewed-by: Dave Chinner
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     
  • commit 8e6e72aeceaaed5aeeb1cb43d3085de7ceb14f79 upstream.

    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French
    CC: Stable
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     

31 Jan, 2019

7 commits

  • commit 0d228ece59a35a9b9e8ff0d40653234a6d90f61e upstream.

    At the time of forced unmount we place the running replace to
    BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state, so when the system comes
    back and expect the target device is missing.

    Then let the replace state continue to be in
    BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state instead of
    BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED as there isn't any matching scrub
    running as part of replace.

    Fixes: e93c89c1aaaa ("Btrfs: add new sources for device replace code")
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     
  • commit 5c06147128fbbdf7a84232c5f0d808f53153defe upstream.

    When we fail to start a transaction in btrfs_dev_replace_start, we leave
    dev_replace->replace_start set to STARTED but clear ->srcdev and
    ->tgtdev. Later, that can result in an Oops in
    btrfs_dev_replace_progress when having state set to STARTED or SUSPENDED
    implies that ->srcdev is valid.

    Also fix error handling when the state is already STARTED or SUSPENDED
    while starting. That, too, will clear ->srcdev and ->tgtdev even though
    it doesn't own them. This should be an impossible case to hit since we
    should be protected by the BTRFS_FS_EXCL_OP bit being set. Let's add an
    ASSERT there while we're at it.

    Fixes: e93c89c1aaaaa (Btrfs: add new sources for device replace code)
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Jeff Mahoney
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Jeff Mahoney
     
  • commit 0ea295dd853e0879a9a30ab61f923c26be35b902 upstream.

    The function truncate_node frees the page with f2fs_put_page. However,
    the page index is read after that. So, the patch reads the index before
    freeing the page.

    Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
    Cc:
    Signed-off-by: Pan Bian
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Pan Bian
     
  • commit ef68e831840c40c7d01b328b3c0f5d8c4796c232 upstream.

    When executing add_credits() we currently call cifs_reconnect()
    if the number of credits is zero and there are no requests in
    flight. In this case we may call cifs_reconnect() recursively
    twice and cause memory corruption given the following sequence
    of functions:

    mid1.callback() -> add_credits() -> cifs_reconnect() ->
    -> mid2.callback() -> add_credits() -> cifs_reconnect().

    Fix this by avoiding to call cifs_reconnect() in add_credits()
    and checking for zero credits in the demultiplex thread.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit ec678eae746dd25766a61c4095e2b649d3b20b09 upstream.

    We do need to account for credits received in error responses
    to read requests on encrypted sessions.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit 8004c78c68e894e4fd5ac3c22cc22eb7dc24cabc upstream.

    Currently we mark MID as malformed if we get an error from server
    in a read response. This leads to not properly processing credits
    in the readv callback. Fix this by marking such a response as
    normal received response and process it appropriately.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit acc58d0bab55a50e02c25f00bd6a210ee121595f upstream.

    When doing MTU i/o we need to leave some credits for
    possible reopen requests and other operations happening
    in parallel. Currently we leave 1 credit which is not
    enough even for reopen only: we need at least 2 credits
    if durable handle reconnect fails. Also there may be
    other operations at the same time including compounding
    ones which require 3 credits at a time each. Fix this
    by leaving 8 credits which is big enough to cover most
    scenarios.

    Was able to reproduce this when server was configured
    to give out fewer credits than usual.

    The proper fix would be to reconnect a file handle first
    and then obtain credits for an MTU request but this leads
    to bigger code changes and should happen in other patches.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     

26 Jan, 2019

7 commits

  • commit 7420451f6a109f7f8f1bf283f34d08eba3259fb3 upstream.

    allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the
    config for the build of cifs.ko if want to always prevent mounting
    with these less secure dialects.

    Signed-off-by: Steve French
    Reviewed-by: Aurelien Aptel
    Reviewed-by: Jeremy Allison
    Cc: Alakesh Haloi
    Signed-off-by: Greg Kroah-Hartman

    Steve French
     
  • commit c156618e15101a9cc8c815108fec0300a0ec6637 upstream.

    The following deadlock can occur between a process waiting for a client
    to initialize in while walking the client list during nfsv4 server trunking
    detection and another process waiting for the nfs_clid_init_mutex so it
    can initialize that client:

    Process 1 Process 2
    --------- ---------
    spin_lock(&nn->nfs_client_lock);
    list_add_tail(&CLIENTA->cl_share_link,
    &nn->nfs_client_list);
    spin_unlock(&nn->nfs_client_lock);
    spin_lock(&nn->nfs_client_lock);
    list_add_tail(&CLIENTB->cl_share_link,
    &nn->nfs_client_list);
    spin_unlock(&nn->nfs_client_lock);
    mutex_lock(&nfs_clid_init_mutex);
    nfs41_walk_client_list(clp, result, cred);
    nfs_wait_client_init_complete(CLIENTA);
    (waiting for nfs_clid_init_mutex)

    Make sure nfs_match_client() only evaluates clients that have completed
    initialization in order to prevent that deadlock.

    This patch also fixes v4.0 trunking behavior by not marking the client
    NFS_CS_READY until the clientid has been confirmed.

    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker
    Signed-off-by: Qian Lu
    Signed-off-by: Greg Kroah-Hartman

    Scott Mayhew
     
  • [ Upstream commit 532e1e54c8140188e192348c790317921cb2dc1c ]

    mount.ocfs2 ignore the inconsistent error that journal is clean but
    local alloc is unrecovered. After mount, local alloc not empty, then
    reserver cluster didn't alloc a new local alloc window, reserveration
    map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the
    following panic.

    This issue was reported at

    https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html

    and was advised to fixed during mount. But this is a very unusual
    inconsistent state, usually journal dirty flag should be cleared at the
    last stage of umount until every other things go right. We may need do
    further debug to check that. Any way to avoid possible futher
    corruption, mount should be abort and fsck should be run.

    (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered!
    found = 6518, set = 6518, taken = 8192, off = 15912372
    ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode.
    o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes
    ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode.
    o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device
    o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777
    o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
    o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
    ------------[ cut here ]------------
    kernel BUG at fs/ocfs2/reservations.c:507!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod
    CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2
    Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018
    task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000
    RIP: 0010:[] [] __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
    Call Trace:
    ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2]
    ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2]
    __ocfs2_claim_clusters+0x178/0x360 [ocfs2]
    ocfs2_claim_clusters+0x1f/0x30 [ocfs2]
    ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2]
    ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2]
    ocfs2_write_begin+0x13e/0x230 [ocfs2]
    generic_perform_write+0xbf/0x1c0
    __generic_file_write_iter+0x19c/0x1d0
    ocfs2_file_write_iter+0x589/0x1360 [ocfs2]
    __vfs_write+0xb8/0x110
    vfs_write+0xa9/0x1b0
    SyS_write+0x46/0xb0
    system_call_fastpath+0x18/0xd7
    Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85
    RIP __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
    RSP
    ---[ end trace 566f07529f2edf3c ]---
    Kernel panic - not syncing: Fatal exception
    Kernel Offset: disabled

    Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Yiwen Jiang
    Acked-by: Joseph Qi
    Cc: Jun Piao
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Changwei Ge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Junxiao Bi
     
  • [ Upstream commit 41c4f85cdac280d356df1f483000ecec4a8868be ]

    Commit 1fa5efe3622db58cb8c7b9a50665e9eb9a6c7e97 (ext4: Use generic helpers for quotaon
    and quotaoff) made possible to call quotactl(Q_XQUOTAON/OFF) on ext4 filesystems
    with sysfile quota support. This leads to calling dquot_enable/disable without s_umount
    held in excl. mode, because quotactl_cmd_onoff checks only for Q_QUOTAON/OFF.

    The following WARN_ON_ONCE triggers (in this case for dquot_enable, ext4, latest Linus' tree):

    [ 117.807056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: quota,prjquota

    [...]

    [ 155.036847] WARNING: CPU: 0 PID: 2343 at fs/quota/dquot.c:2469 dquot_enable+0x34/0xb9
    [ 155.036851] Modules linked in: quota_v2 quota_tree ipv6 af_packet joydev mousedev psmouse serio_raw pcspkr i2c_piix4 intel_agp intel_gtt e1000 ttm drm_kms_helper drm agpgart fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core input_leds kvm_intel kvm irqbypass qemu_fw_cfg floppy evdev parport_pc parport button crc32c_generic dm_mod ata_generic pata_acpi ata_piix libata loop ext4 crc16 mbcache jbd2 usb_storage usbcore sd_mod scsi_mod
    [ 155.036901] CPU: 0 PID: 2343 Comm: qctl Not tainted 4.20.0-rc6-00025-gf5d582777bcb #9
    [ 155.036903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    [ 155.036911] RIP: 0010:dquot_enable+0x34/0xb9
    [ 155.036915] Code: 41 56 41 55 41 54 55 53 4c 8b 6f 28 74 02 0f 0b 4d 8d 7d 70 49 89 fc 89 cb 41 89 d6 89 f5 4c 89 ff e8 23 09 ea ff 85 c0 74 0a 0b 4c 89 ff e8 8b 09 ea ff 85 db 74 6a 41 8b b5 f8 00 00 00 0f
    [ 155.036918] RSP: 0018:ffffb09b00493e08 EFLAGS: 00010202
    [ 155.036922] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 0000000000000008
    [ 155.036924] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9781b67cd870
    [ 155.036926] RBP: 0000000000000002 R08: 0000000000000000 R09: 61c8864680b583eb
    [ 155.036929] R10: ffffb09b00493e48 R11: ffffffffff7ce7d4 R12: ffff9781b7ee8d78
    [ 155.036932] R13: ffff9781b67cd800 R14: 0000000000000004 R15: ffff9781b67cd870
    [ 155.036936] FS: 00007fd813250b88(0000) GS:ffff9781ba000000(0000) knlGS:0000000000000000
    [ 155.036939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 155.036942] CR2: 00007fd812ff61d6 CR3: 000000007c882000 CR4: 00000000000006b0
    [ 155.036951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 155.036953] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 155.036955] Call Trace:
    [ 155.037004] dquot_quota_enable+0x8b/0xd0
    [ 155.037011] kernel_quotactl+0x628/0x74e
    [ 155.037027] ? do_mprotect_pkey+0x2a6/0x2cd
    [ 155.037034] __x64_sys_quotactl+0x1a/0x1d
    [ 155.037041] do_syscall_64+0x55/0xe4
    [ 155.037078] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 155.037105] RIP: 0033:0x7fd812fe1198
    [ 155.037109] Code: 02 77 0d 48 89 c1 48 c1 e9 3f 75 04 48 8b 04 24 48 83 c4 50 5b c3 48 83 ec 08 49 89 ca 48 63 d2 48 63 ff b8 b3 00 00 00 0f 05 89 c7 e8 c1 eb ff ff 5a c3 48 63 ff b8 bb 00 00 00 0f 05 48 89
    [ 155.037112] RSP: 002b:00007ffe8cd7b050 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
    [ 155.037116] RAX: ffffffffffffffda RBX: 00007ffe8cd7b148 RCX: 00007fd812fe1198
    [ 155.037119] RDX: 0000000000000000 RSI: 00007ffe8cd7cea9 RDI: 0000000000580102
    [ 155.037121] RBP: 00007ffe8cd7b0f0 R08: 000055fc8eba8a9d R09: 0000000000000000
    [ 155.037124] R10: 00007ffe8cd7b074 R11: 0000000000000206 R12: 00007ffe8cd7b168
    [ 155.037126] R13: 000055fc8eba8897 R14: 0000000000000000 R15: 0000000000000000
    [ 155.037131] ---[ end trace 210f864257175c51 ]---

    and then the syscall proceeds without s_umount locking.

    This patch locks the superblock ->s_umount sem. in exclusive mode for all Q_XQUOTAON/OFF
    quotactls too in addition to Q_QUOTAON/OFF.

    AFAICT, other than ext4, only xfs and ocfs2 are affected by this change.
    The VFS will now call in xfs_quota_* functions with s_umount held, which wasn't the case
    before. This looks good to me but I can not say for sure. Ext4 and ocfs2 where already
    beeing called with s_umount exclusive via quota_quotaon/off which is basically the same.

    Signed-off-by: Javier Barrio
    Signed-off-by: Jan Kara
    Signed-off-by: Sasha Levin

    Javier Barrio
     
  • [ Upstream commit 1690dd41e0cb1dade80850ed8a3eb0121b96d22f ]

    In the error handling block, err holds the return value of either
    btrfs_del_root_ref() or btrfs_del_inode_ref() but it hasn't been checked
    since it's introduction with commit fe66a05a0679 (Btrfs: improve error
    handling for btrfs_insert_dir_item callers) in 2012.

    If the error handling in the error handling fails, there's not much left
    to do and the abort either happened earlier in the callees or is
    necessary here.

    So if one of btrfs_del_root_ref() or btrfs_del_inode_ref() failed, abort
    the transaction, but still return the original code of the failure
    stored in 'ret' as this will be reported to the user.

    Signed-off-by: Johannes Thumshirn
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin

    Johannes Thumshirn
     
  • [ Upstream commit 30696378f68a9e3dad6bfe55938b112e72af00c2 ]

    The ramoops backend currently calls persistent_ram_save_old() even
    if a buffer is empty. While this appears to work, it is does not seem
    like the right thing to do and could lead to future bugs so lets avoid
    that. It also prevents misleading prints in the logs which claim the
    buffer is valid.

    I got something like:

    found existing buffer, size 0, start 0

    When I was expecting:

    no valid data in buffer (sig = ...)

    This bails out early (and reports with pr_debug()), since it's an
    acceptable state.

    Signed-off-by: Joel Fernandes (Google)
    Co-developed-by: Kees Cook
    Signed-off-by: Kees Cook
    Signed-off-by: Sasha Levin

    Joel Fernandes (Google)
     
  • [ Upstream commit a788c5272769ddbcdbab297cf386413eeac04463 ]

    jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER
    is defined then a write buffer is available and has been initialized.
    However, this does is not the case when the mtd device has no
    out-of-band buffer:

    int jffs2_nand_flash_setup(struct jffs2_sb_info *c)
    {
    if (!c->mtd->oobsize)
    return 0;
    ...

    The resulting call to cancel_delayed_work_sync passing a uninitialized
    (but zeroed) delayed_work struct forces lockdep to become disabled.

    [ 90.050639] overlayfs: upper fs does not support tmpfile.
    [ 90.652264] INFO: trying to register non-static key.
    [ 90.662171] the code is fine but needs lockdep annotation.
    [ 90.673090] turning off the locking correctness validator.
    [ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0
    [ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7
    [ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff
    [ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420
    [ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90
    [ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000
    [ 90.780033] ...
    [ 90.784902] Call Trace:
    [ 90.789793] [] show_stack+0xb8/0x148
    [ 90.798659] [] register_lock_class+0x270/0x55c
    [ 90.809247] [] __lock_acquire+0x13c/0xf7c
    [ 90.818964] [] lock_acquire+0x194/0x1dc
    [ 90.828345] [] flush_work+0x200/0x24c
    [ 90.837374] [] __cancel_work_timer+0x158/0x210
    [ 90.847958] [] jffs2_sync_fs+0x20/0x54
    [ 90.857173] [] iterate_supers+0xf4/0x120
    [ 90.866729] [] sys_sync+0x44/0x9c
    [ 90.875067] [] syscall_common+0x34/0x58

    Signed-off-by: Daniel Santos
    Reviewed-by: Hou Tao
    Signed-off-by: Boris Brezillon
    Signed-off-by: Sasha Levin

    Daniel Santos
     

23 Jan, 2019

4 commits

  • commit 04906b2f542c23626b0ef6219b808406f8dddbe9 upstream.

    bd_set_size() updates also block device's block size. This is somewhat
    unexpected from its name and at this point, only blkdev_open() uses this
    functionality. Furthermore, this can result in changing block size under
    a filesystem mounted on a loop device which leads to livelocks inside
    __getblk_gfp() like:

    Sending NMI from CPU 0 to CPUs 1:
    NMI backtrace for cpu 1
    CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
    01/01/2011
    RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
    ...
    Call Trace:
    init_page_buffers+0x3e2/0x530 fs/buffer.c:904
    grow_dev_page fs/buffer.c:947 [inline]
    grow_buffers fs/buffer.c:1009 [inline]
    __getblk_slow fs/buffer.c:1036 [inline]
    __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
    __bread_gfp+0x2d/0x310 fs/buffer.c:1347
    sb_bread include/linux/buffer_head.h:307 [inline]
    fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
    fat_ent_read_block fs/fat/fatent.c:441 [inline]
    fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
    fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
    __fat_get_block fs/fat/inode.c:148 [inline]
    ...

    Trivial reproducer for the problem looks like:

    truncate -s 1G /tmp/image
    losetup /dev/loop0 /tmp/image
    mkfs.ext4 -b 1024 /dev/loop0
    mount -t ext4 /dev/loop0 /mnt
    losetup -c /dev/loop0
    l /mnt

    Fix the problem by moving initialization of a block device block size
    into a separate function and call it when needed.

    Thanks to Tetsuo Handa for help with
    debugging the problem.

    Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 5631e8576a3caf606cdc375f97425a67983b420c upstream.

    Yue Hu noticed that when parsing device tree the allocated platform data
    was never freed. Since it's not used beyond the function scope, this
    switches to using a stack variable instead.

    Reported-by: Yue Hu
    Fixes: 35da60941e44 ("pstore/ram: add Device Tree bindings")
    Cc: stable@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit 74d5d229b1bf60f93bff244b2dfc0eb21ec32a07 upstream.

    If we flip read-only before we initiate writeback on all dirty pages for
    ordered extents we've created then we'll have ordered extents left over
    on umount, which results in all sorts of bad things happening. Fix this
    by making sure we wait on ordered extents if we have to do the aborted
    transaction cleanup stuff.

    generic/475 can produce this warning:

    [ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs]
    [ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G W 5.0.0-rc1-default+ #394
    [ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
    [ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs]
    [ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286
    [ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000
    [ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000
    [ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000
    [ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0
    [ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000
    [ 8531.202707] FS: 00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000
    [ 8531.204851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0
    [ 8531.207516] Call Trace:
    [ 8531.208175] btrfs_free_fs_roots+0xdb/0x170 [btrfs]
    [ 8531.210209] ? wait_for_completion+0x5b/0x190
    [ 8531.211303] close_ctree+0x157/0x350 [btrfs]
    [ 8531.212412] generic_shutdown_super+0x64/0x100
    [ 8531.213485] kill_anon_super+0x14/0x30
    [ 8531.214430] btrfs_kill_super+0x12/0xa0 [btrfs]
    [ 8531.215539] deactivate_locked_super+0x29/0x60
    [ 8531.216633] cleanup_mnt+0x3b/0x70
    [ 8531.217497] task_work_run+0x98/0xc0
    [ 8531.218397] exit_to_usermode_loop+0x83/0x90
    [ 8531.219324] do_syscall_64+0x15b/0x180
    [ 8531.220192] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 8531.221286] RIP: 0033:0x7fc68e5e4d07
    [ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6
    [ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07
    [ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80
    [ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80
    [ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80
    [ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000

    Leaving a tree in the rb-tree:

    3853 void btrfs_free_fs_root(struct btrfs_root *root)
    3854 {
    3855 iput(root->ino_cache_inode);
    3856 WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));

    CC: stable@vger.kernel.org
    Reviewed-by: Nikolay Borisov
    Signed-off-by: Josef Bacik
    [ add stacktrace ]
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 77b7aad195099e7c6da11e94b7fa6ef5e6fb0025 upstream.

    This reverts commit e73e81b6d0114d4a303205a952ab2e87c44bd279.

    This patch causes a few problems:

    - adds latency to btrfs_finish_ordered_io
    - as btrfs_finish_ordered_io is used for free space cache, generating
    more work from btrfs_btree_balance_dirty_nodelay could end up in the
    same workque, effectively deadlocking

    12260 kworker/u96:16+btrfs-freespace-write D
    [] balance_dirty_pages+0x6e6/0x7ad
    [] balance_dirty_pages_ratelimited+0x6bb/0xa90
    [] btrfs_finish_ordered_io+0x3da/0x770
    [] normal_work_helper+0x1c5/0x5a0
    [] process_one_work+0x1ee/0x5a0
    [] worker_thread+0x46/0x3d0
    [] kthread+0xf5/0x130
    [] ret_from_fork+0x24/0x30
    [] 0xffffffffffffffff

    Transaction commit will wait on the freespace cache:

    838 btrfs-transacti D
    [] btrfs_start_ordered_extent+0x154/0x1e0
    [] btrfs_wait_ordered_range+0xbd/0x110
    [] __btrfs_wait_cache_io+0x49/0x1a0
    [] btrfs_write_dirty_block_groups+0x10b/0x3b0
    [] commit_cowonly_roots+0x215/0x2b0
    [] btrfs_commit_transaction+0x37e/0x910
    [] transaction_kthread+0x14d/0x180
    [] kthread+0xf5/0x130
    [] ret_from_fork+0x24/0x30
    [] 0xffffffffffffffff

    And then writepages ends up waiting on transaction commit:

    9520 kworker/u96:13+flush-btrfs-1 D
    [] wait_current_trans+0xac/0xe0
    [] start_transaction+0x21b/0x4b0
    [] cow_file_range_inline+0x10b/0x6b0
    [] cow_file_range.isra.69+0x329/0x4a0
    [] run_delalloc_range+0x105/0x3c0
    [] writepage_delalloc+0x119/0x180
    [] __extent_writepage+0x10c/0x390
    [] extent_write_cache_pages+0x26f/0x3d0
    [] extent_writepages+0x4f/0x80
    [] do_writepages+0x17/0x60
    [] __writeback_single_inode+0x59/0x690
    [] writeback_sb_inodes+0x291/0x4e0
    [] __writeback_inodes_wb+0x87/0xb0
    [] wb_writeback+0x3bb/0x500
    [] wb_workfn+0x40d/0x610
    [] process_one_work+0x1ee/0x5a0
    [] worker_thread+0x1e0/0x3d0
    [] kthread+0xf5/0x130
    [] ret_from_fork+0x24/0x30
    [] 0xffffffffffffffff

    Eventually, we have every process in the system waiting on
    balance_dirty_pages(), and nobody is able to make progress on page
    writeback.

    The original patch tried to fix an OOM condition, that happened on 4.4 but no
    success reproducing that on later kernels (4.19 and 4.20). This is more likely
    a problem in OOM itself.

    Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
    Reported-by: Chris Mason
    CC: stable@vger.kernel.org # 4.18+
    CC: ethanlien
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    David Sterba
     

17 Jan, 2019

8 commits

  • commit 95cb67138746451cc84cf8e516e14989746e93b0 upstream.

    We already using mapping_set_error() in fs/ext4/page_io.c, so all we
    need to do is to use file_check_and_advance_wb_err() when handling
    fsync() requests in ext4_sync_file().

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a upstream.

    In no-journal mode, we previously used __generic_file_fsync() in
    no-journal mode. This triggers a lockdep warning, and in addition,
    it's not safe to depend on the inode writeback mechanism in the case
    ext4. We can solve both problems by calling ext4_write_inode()
    directly.

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit e86807862e6880809f191c4cea7f88a489f0ed34 upstream.

    The xfstests generic/475 test switches the underlying device with
    dm-error while running a stress test. This results in a large number
    of file system errors, and since we can't lock the buffer head when
    marking the superblock dirty in the ext4_grp_locked_error() case, it's
    possible the superblock to be !buffer_uptodate() without
    buffer_write_io_error() being true.

    We need to set buffer_uptodate() before we call mark_buffer_dirty() or
    this will trigger a WARN_ON. It's safe to do this since the
    superblock must have been properly read into memory or the mount would
    have been successful. So if buffer_uptodate() is not set, we can
    safely assume that this happened due to a failed attempt to write the
    superblock.

    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 2b08b1f12cd664dc7d5c84ead9ff25ae97ad5491 upstream.

    The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent()
    while still holding the xattr semaphore. This is not necessary and it
    triggers a circular lockdep warning. This is because
    fiemap_fill_next_extent() could trigger a page fault when it writes
    into page which triggers a page fault. If that page is mmaped from
    the inline file in question, this could very well result in a
    deadlock.

    This problem can be reproduced using generic/519 with a file system
    configuration which has the inline_data feature enabled.

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 812c0cab2c0dfad977605dbadf9148490ca5d93f upstream.

    There are enough credits reserved for most dioread_nolock writes;
    however, if the extent tree is sufficiently deep, and/or quota is
    enabled, the code was not allowing for all eventualities when
    reserving journal credits for the unwritten extent conversion.

    This problem can be seen using xfstests ext4/034:

    WARNING: CPU: 1 PID: 257 at fs/ext4/ext4_jbd2.c:271 __ext4_handle_dirty_metadata+0x10c/0x180
    Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
    RIP: 0010:__ext4_handle_dirty_metadata+0x10c/0x180
    ...
    EXT4-fs: ext4_free_blocks:4938: aborting transaction: error 28 in __ext4_handle_dirty_metadata
    EXT4: jbd2_journal_dirty_metadata failed: handle type 11 started at line 4921, credits 4/0, errcode -28
    EXT4-fs error (device dm-1) in ext4_free_blocks:4950: error 28

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit b9a74cde94957d82003fb9f7ab4777938ca851cd upstream.

    If maxBuf is small but non-zero, it could result in a zero sized lock
    element array which we would then try and access OOB.

    Signed-off-by: Ross Lagerwall
    Signed-off-by: Steve French
    CC: Stable
    Signed-off-by: Greg Kroah-Hartman

    Ross Lagerwall
     
  • commit ee13919c2e8d1f904e035ad4b4239029a8994131 upstream.

    Currently we hide EINTR code returned from sock_sendmsg()
    and return 0 instead. This makes a caller think that we
    successfully completed the network operation which is not
    true. Fix this by properly returning EINTR to callers.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Jeff Layton
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit b983f7e92348d7e7d091db1b78b7915e9dd3d63a upstream.

    Currently for MTU requests we allocate maximum possible credits
    in advance and then adjust them according to the request size.
    While we were adjusting the number of credits belonging to the
    server, we were skipping adjustment of credits belonging to the
    request. This patch fixes it by setting request credits to
    CreditCharge field value of SMB2 packet header.

    Also ask 1 credit more for async read and write operations to
    increase parallelism and match the behavior of other operations.

    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French
    CC: Stable
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     

13 Jan, 2019

6 commits

  • commit 3c1392d4c49962a31874af14ae9ff289cb2b3851 upstream.

    Updating mseq makes client think importer mds has accepted all prior
    cap messages and importer mds knows what caps client wants. Actually
    some cap messages may have been dropped because of mseq mismatch.

    If mseq is left untouched, importing cap's mds_wanted later will get
    reset by cap import message.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Yan, Zheng
     
  • commit b8eee0e90f9797b747113638bc75e739b192ad38 upstream.

    Commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid
    for remote locks") specified that the l_pid returned for F_GETLK on a local
    file that has a remote lock should be the pid of the lock manager process.
    That commit, while updating other filesystems, failed to update lockd, such
    that locks created by lockd had their fl_pid set to that of the remote
    process holding the lock. Fix that here to be the pid of lockd.

    Also, fix the client case so that the returned lock pid is negative, which
    indicates a remote lock on a remote file.

    Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...")
    Cc: stable@vger.kernel.org

    Signed-off-by: Benjamin Coddington
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Benjamin Coddington
     
  • commit 2d29f6b96d8f80322ed2dd895bca590491c38d34 upstream.

    Fix the resource group wrap-around logic in gfs2_rbm_find that commit
    e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the
    same bitmaps; there is a risk that future changes will turn this into an
    endless loop.

    Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii")
    Cc: stable@vger.kernel.org # v3.13+
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     
  • commit 6ff9b09e00a441599f3aacdf577254455a048bc9 upstream.

    In gfs2_create_inode, after setting and releasing the acl / default_acl, the
    acl / default_acl pointers are not set to NULL as they should be. In that
    state, when the function reaches label fail_free_acls, gfs2_create_inode will
    try to release the same acls again.

    Fix that by setting the pointers to NULL after releasing the acls. Slightly
    simplify the logic. Also, posix_acl_release checks for NULL already, so
    there is no need to duplicate those checks here.

    Fixes: e01580bf9e4d ("gfs2: use generic posix ACL infrastructure")
    Reported-by: Pan Bian
    Cc: Christoph Hellwig
    Cc: stable@vger.kernel.org # v4.9+
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     
  • commit d47b41aceeadc6b58abc9c7c6485bef7cfb75636 upstream.

    According to comment in dlm_user_request() ua should be freed
    in dlm_free_lkb() after successful attach to lkb.

    However ua is attached to lkb not in set_lock_args() but later,
    inside request_lock().

    Fixes 597d0cae0f99 ("[DLM] dlm: user locks")
    Cc: stable@kernel.org # 2.6.19

    Signed-off-by: Vasily Averin
    Signed-off-by: David Teigland
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin
     
  • commit c0174726c3976e67da8649ac62cae43220ae173a upstream.

    Fixes 6d40c4a708e0 ("dlm: improve error and debug messages")
    Cc: stable@kernel.org # 3.5

    Signed-off-by: Vasily Averin
    Signed-off-by: David Teigland
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin