12 Feb, 2019
2 commits
-
That we we can also poll non blk-mq queues. Mostly needed for
the NVMe multipath code, but could also be useful elsewhere.Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
(cherry picked from commit ea435e1b9392a33deceaea2a16ebaa3397bead93) -
Fix bug of commit 74d46992e0d9 ("block: replace bi_bdev with a gendisk
pointer and partitions index").bio_dev(bio) is used to find the dev state in function
__btrfsic_submit_bio. But when dev_state is added to the hashtable, it
is using dev_t of block_device.bio_dev(bio) returns a dev_t of part0 which is different from dev_t in
block_device(bd_dev). bd_dev in block_device represents the exact
partition.block_device.bd_dev =
bio->bi_partno (same as block_device.bd_partno) + bio_dev(bio).When adding a dev_state into hashtable, we use the exact partition dev_t.
So when looking it up, it should also use the exact partition dev_t.Reproducer of this bug:
Use MOUNT_OPTIONS="-o check_int" and run btrfs/001 in fstests.
Then there will be WARNING like below.WARNING:
btrfs: attempt to write superblock which references block M @29523968 (sda7 /1111654400/2) which is never written!Signed-off-by: Gu JinXiang
Reviewed-by: David Sterba
Signed-off-by: David Sterba
(cherry picked from commit d28e649a5c58b779b303c252c66ee84a0f2c3b32)
07 Feb, 2019
6 commits
-
commit b469e7e47c8a075cc08bcd1e85d4365134bdcdd5 upstream.
When an event is reported on a sub-directory and the parent inode has
a mark mask with FS_EVENT_ON_CHILD|FS_ISDIR, the event will be sent to
fsnotify() even if the event type is not in the parent mark mask
(e.g. FS_OPEN).Further more, if that event happened on a mount or a filesystem with
a mount/sb mark that does have that event type in their mask, the "on
child" event will be reported on the mount/sb mark. That is not
desired, because user will get a duplicate event for the same action.Note that the event reported on the victim inode is never merged with
the event reported on the parent inode, because of the check in
should_merge(): old_fsn->inode == new_fsn->inode.Fix this by looking for a match of an actual event type (i.e. not just
FS_ISDIR) in parent's inode mark mask and by not reporting an "on child"
event to group if event type is only found on mount/sb marks.[backport hint: The bug seems to have always been in fanotify, but this
patch will only apply cleanly to v4.19.y]Cc: # v4.19
Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara
[amir: backport to v4.9]
Signed-off-by: Amir Goldstein
Signed-off-by: Greg Kroah-Hartman -
commit 28eb24ff75c5ac130eb326b3b4d0dcecfc0f427d upstream.
In case a hostname resolves to a different IP address (e.g. long
running mounts), make sure to resolve it every time prior to calling
generic_ip_connect() in reconnect.Suggested-by: Steve French
Signed-off-by: Paulo Alcantara
Signed-off-by: Steve French
Signed-off-by: Pavel Shilovsky
Signed-off-by: Greg Kroah-Hartman -
commit e74c98ca2d6ae4376cc15fa2a22483430909d96b upstream.
This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.
It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone. Let's revert this
for now to have more time for a proper fix.Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman -
commit 8fc75bed96bb94e23ca51bd9be4daf65c57697bf upstream.
Ensure that we return the fatal error value that caused us to exit
nfs_page_async_flush().Fixes: c373fff7bd25 ("NFSv4: Don't special case "launder"")
Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: Benjamin Coddington
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman -
commit 1dbd449c9943e3145148cc893c2461b72ba6fef0 upstream.
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused. This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: Waiman Long
Reviewed-by: Dave Chinner
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman -
commit 8e6e72aeceaaed5aeeb1cb43d3085de7ceb14f79 upstream.
Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French
CC: Stable
Signed-off-by: Greg Kroah-Hartman
31 Jan, 2019
7 commits
-
commit 0d228ece59a35a9b9e8ff0d40653234a6d90f61e upstream.
At the time of forced unmount we place the running replace to
BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state, so when the system comes
back and expect the target device is missing.Then let the replace state continue to be in
BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state instead of
BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED as there isn't any matching scrub
running as part of replace.Fixes: e93c89c1aaaa ("Btrfs: add new sources for device replace code")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Anand Jain
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman -
commit 5c06147128fbbdf7a84232c5f0d808f53153defe upstream.
When we fail to start a transaction in btrfs_dev_replace_start, we leave
dev_replace->replace_start set to STARTED but clear ->srcdev and
->tgtdev. Later, that can result in an Oops in
btrfs_dev_replace_progress when having state set to STARTED or SUSPENDED
implies that ->srcdev is valid.Also fix error handling when the state is already STARTED or SUSPENDED
while starting. That, too, will clear ->srcdev and ->tgtdev even though
it doesn't own them. This should be an impossible case to hit since we
should be protected by the BTRFS_FS_EXCL_OP bit being set. Let's add an
ASSERT there while we're at it.Fixes: e93c89c1aaaaa (Btrfs: add new sources for device replace code)
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman -
commit 0ea295dd853e0879a9a30ab61f923c26be35b902 upstream.
The function truncate_node frees the page with f2fs_put_page. However,
the page index is read after that. So, the patch reads the index before
freeing the page.Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
Cc:
Signed-off-by: Pan Bian
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman -
commit ef68e831840c40c7d01b328b3c0f5d8c4796c232 upstream.
When executing add_credits() we currently call cifs_reconnect()
if the number of credits is zero and there are no requests in
flight. In this case we may call cifs_reconnect() recursively
twice and cause memory corruption given the following sequence
of functions:mid1.callback() -> add_credits() -> cifs_reconnect() ->
-> mid2.callback() -> add_credits() -> cifs_reconnect().Fix this by avoiding to call cifs_reconnect() in add_credits()
and checking for zero credits in the demultiplex thread.Cc:
Signed-off-by: Pavel Shilovsky
Reviewed-by: Ronnie Sahlberg
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman -
commit ec678eae746dd25766a61c4095e2b649d3b20b09 upstream.
We do need to account for credits received in error responses
to read requests on encrypted sessions.Cc:
Signed-off-by: Pavel Shilovsky
Reviewed-by: Ronnie Sahlberg
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman -
commit 8004c78c68e894e4fd5ac3c22cc22eb7dc24cabc upstream.
Currently we mark MID as malformed if we get an error from server
in a read response. This leads to not properly processing credits
in the readv callback. Fix this by marking such a response as
normal received response and process it appropriately.Cc:
Signed-off-by: Pavel Shilovsky
Reviewed-by: Ronnie Sahlberg
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman -
commit acc58d0bab55a50e02c25f00bd6a210ee121595f upstream.
When doing MTU i/o we need to leave some credits for
possible reopen requests and other operations happening
in parallel. Currently we leave 1 credit which is not
enough even for reopen only: we need at least 2 credits
if durable handle reconnect fails. Also there may be
other operations at the same time including compounding
ones which require 3 credits at a time each. Fix this
by leaving 8 credits which is big enough to cover most
scenarios.Was able to reproduce this when server was configured
to give out fewer credits than usual.The proper fix would be to reconnect a file handle first
and then obtain credits for an MTU request but this leads
to bigger code changes and should happen in other patches.Cc:
Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman
26 Jan, 2019
7 commits
-
commit 7420451f6a109f7f8f1bf283f34d08eba3259fb3 upstream.
allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the
config for the build of cifs.ko if want to always prevent mounting
with these less secure dialects.Signed-off-by: Steve French
Reviewed-by: Aurelien Aptel
Reviewed-by: Jeremy Allison
Cc: Alakesh Haloi
Signed-off-by: Greg Kroah-Hartman -
commit c156618e15101a9cc8c815108fec0300a0ec6637 upstream.
The following deadlock can occur between a process waiting for a client
to initialize in while walking the client list during nfsv4 server trunking
detection and another process waiting for the nfs_clid_init_mutex so it
can initialize that client:Process 1 Process 2
--------- ---------
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTA->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTB->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
mutex_lock(&nfs_clid_init_mutex);
nfs41_walk_client_list(clp, result, cred);
nfs_wait_client_init_complete(CLIENTA);
(waiting for nfs_clid_init_mutex)Make sure nfs_match_client() only evaluates clients that have completed
initialization in order to prevent that deadlock.This patch also fixes v4.0 trunking behavior by not marking the client
NFS_CS_READY until the clientid has been confirmed.Signed-off-by: Scott Mayhew
Signed-off-by: Anna Schumaker
Signed-off-by: Qian Lu
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 532e1e54c8140188e192348c790317921cb2dc1c ]
mount.ocfs2 ignore the inconsistent error that journal is clean but
local alloc is unrecovered. After mount, local alloc not empty, then
reserver cluster didn't alloc a new local alloc window, reserveration
map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the
following panic.This issue was reported at
https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html
and was advised to fixed during mount. But this is a very unusual
inconsistent state, usually journal dirty flag should be cleared at the
last stage of umount until every other things go right. We may need do
further debug to check that. Any way to avoid possible futher
corruption, mount should be abort and fsck should be run.(mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered!
found = 6518, set = 6518, taken = 8192, off = 15912372
ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode.
o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes
ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode.
o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device
o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777
o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
------------[ cut here ]------------
kernel BUG at fs/ocfs2/reservations.c:507!
invalid opcode: 0000 [#1] SMP
Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2
Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018
task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000
RIP: 0010:[] [] __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
Call Trace:
ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2]
ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2]
__ocfs2_claim_clusters+0x178/0x360 [ocfs2]
ocfs2_claim_clusters+0x1f/0x30 [ocfs2]
ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2]
ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2]
ocfs2_write_begin+0x13e/0x230 [ocfs2]
generic_perform_write+0xbf/0x1c0
__generic_file_write_iter+0x19c/0x1d0
ocfs2_file_write_iter+0x589/0x1360 [ocfs2]
__vfs_write+0xb8/0x110
vfs_write+0xa9/0x1b0
SyS_write+0x46/0xb0
system_call_fastpath+0x18/0xd7
Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85
RIP __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
RSP
---[ end trace 566f07529f2edf3c ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabledLink: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi
Reviewed-by: Yiwen Jiang
Acked-by: Joseph Qi
Cc: Jun Piao
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Changwei Ge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin -
[ Upstream commit 41c4f85cdac280d356df1f483000ecec4a8868be ]
Commit 1fa5efe3622db58cb8c7b9a50665e9eb9a6c7e97 (ext4: Use generic helpers for quotaon
and quotaoff) made possible to call quotactl(Q_XQUOTAON/OFF) on ext4 filesystems
with sysfile quota support. This leads to calling dquot_enable/disable without s_umount
held in excl. mode, because quotactl_cmd_onoff checks only for Q_QUOTAON/OFF.The following WARN_ON_ONCE triggers (in this case for dquot_enable, ext4, latest Linus' tree):
[ 117.807056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: quota,prjquota
[...]
[ 155.036847] WARNING: CPU: 0 PID: 2343 at fs/quota/dquot.c:2469 dquot_enable+0x34/0xb9
[ 155.036851] Modules linked in: quota_v2 quota_tree ipv6 af_packet joydev mousedev psmouse serio_raw pcspkr i2c_piix4 intel_agp intel_gtt e1000 ttm drm_kms_helper drm agpgart fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core input_leds kvm_intel kvm irqbypass qemu_fw_cfg floppy evdev parport_pc parport button crc32c_generic dm_mod ata_generic pata_acpi ata_piix libata loop ext4 crc16 mbcache jbd2 usb_storage usbcore sd_mod scsi_mod
[ 155.036901] CPU: 0 PID: 2343 Comm: qctl Not tainted 4.20.0-rc6-00025-gf5d582777bcb #9
[ 155.036903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 155.036911] RIP: 0010:dquot_enable+0x34/0xb9
[ 155.036915] Code: 41 56 41 55 41 54 55 53 4c 8b 6f 28 74 02 0f 0b 4d 8d 7d 70 49 89 fc 89 cb 41 89 d6 89 f5 4c 89 ff e8 23 09 ea ff 85 c0 74 0a 0b 4c 89 ff e8 8b 09 ea ff 85 db 74 6a 41 8b b5 f8 00 00 00 0f
[ 155.036918] RSP: 0018:ffffb09b00493e08 EFLAGS: 00010202
[ 155.036922] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 0000000000000008
[ 155.036924] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9781b67cd870
[ 155.036926] RBP: 0000000000000002 R08: 0000000000000000 R09: 61c8864680b583eb
[ 155.036929] R10: ffffb09b00493e48 R11: ffffffffff7ce7d4 R12: ffff9781b7ee8d78
[ 155.036932] R13: ffff9781b67cd800 R14: 0000000000000004 R15: ffff9781b67cd870
[ 155.036936] FS: 00007fd813250b88(0000) GS:ffff9781ba000000(0000) knlGS:0000000000000000
[ 155.036939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 155.036942] CR2: 00007fd812ff61d6 CR3: 000000007c882000 CR4: 00000000000006b0
[ 155.036951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 155.036953] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 155.036955] Call Trace:
[ 155.037004] dquot_quota_enable+0x8b/0xd0
[ 155.037011] kernel_quotactl+0x628/0x74e
[ 155.037027] ? do_mprotect_pkey+0x2a6/0x2cd
[ 155.037034] __x64_sys_quotactl+0x1a/0x1d
[ 155.037041] do_syscall_64+0x55/0xe4
[ 155.037078] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 155.037105] RIP: 0033:0x7fd812fe1198
[ 155.037109] Code: 02 77 0d 48 89 c1 48 c1 e9 3f 75 04 48 8b 04 24 48 83 c4 50 5b c3 48 83 ec 08 49 89 ca 48 63 d2 48 63 ff b8 b3 00 00 00 0f 05 89 c7 e8 c1 eb ff ff 5a c3 48 63 ff b8 bb 00 00 00 0f 05 48 89
[ 155.037112] RSP: 002b:00007ffe8cd7b050 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
[ 155.037116] RAX: ffffffffffffffda RBX: 00007ffe8cd7b148 RCX: 00007fd812fe1198
[ 155.037119] RDX: 0000000000000000 RSI: 00007ffe8cd7cea9 RDI: 0000000000580102
[ 155.037121] RBP: 00007ffe8cd7b0f0 R08: 000055fc8eba8a9d R09: 0000000000000000
[ 155.037124] R10: 00007ffe8cd7b074 R11: 0000000000000206 R12: 00007ffe8cd7b168
[ 155.037126] R13: 000055fc8eba8897 R14: 0000000000000000 R15: 0000000000000000
[ 155.037131] ---[ end trace 210f864257175c51 ]---and then the syscall proceeds without s_umount locking.
This patch locks the superblock ->s_umount sem. in exclusive mode for all Q_XQUOTAON/OFF
quotactls too in addition to Q_QUOTAON/OFF.AFAICT, other than ext4, only xfs and ocfs2 are affected by this change.
The VFS will now call in xfs_quota_* functions with s_umount held, which wasn't the case
before. This looks good to me but I can not say for sure. Ext4 and ocfs2 where already
beeing called with s_umount exclusive via quota_quotaon/off which is basically the same.Signed-off-by: Javier Barrio
Signed-off-by: Jan Kara
Signed-off-by: Sasha Levin -
[ Upstream commit 1690dd41e0cb1dade80850ed8a3eb0121b96d22f ]
In the error handling block, err holds the return value of either
btrfs_del_root_ref() or btrfs_del_inode_ref() but it hasn't been checked
since it's introduction with commit fe66a05a0679 (Btrfs: improve error
handling for btrfs_insert_dir_item callers) in 2012.If the error handling in the error handling fails, there's not much left
to do and the abort either happened earlier in the callees or is
necessary here.So if one of btrfs_del_root_ref() or btrfs_del_inode_ref() failed, abort
the transaction, but still return the original code of the failure
stored in 'ret' as this will be reported to the user.Signed-off-by: Johannes Thumshirn
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin -
[ Upstream commit 30696378f68a9e3dad6bfe55938b112e72af00c2 ]
The ramoops backend currently calls persistent_ram_save_old() even
if a buffer is empty. While this appears to work, it is does not seem
like the right thing to do and could lead to future bugs so lets avoid
that. It also prevents misleading prints in the logs which claim the
buffer is valid.I got something like:
found existing buffer, size 0, start 0
When I was expecting:
no valid data in buffer (sig = ...)
This bails out early (and reports with pr_debug()), since it's an
acceptable state.Signed-off-by: Joel Fernandes (Google)
Co-developed-by: Kees Cook
Signed-off-by: Kees Cook
Signed-off-by: Sasha Levin -
[ Upstream commit a788c5272769ddbcdbab297cf386413eeac04463 ]
jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER
is defined then a write buffer is available and has been initialized.
However, this does is not the case when the mtd device has no
out-of-band buffer:int jffs2_nand_flash_setup(struct jffs2_sb_info *c)
{
if (!c->mtd->oobsize)
return 0;
...The resulting call to cancel_delayed_work_sync passing a uninitialized
(but zeroed) delayed_work struct forces lockdep to become disabled.[ 90.050639] overlayfs: upper fs does not support tmpfile.
[ 90.652264] INFO: trying to register non-static key.
[ 90.662171] the code is fine but needs lockdep annotation.
[ 90.673090] turning off the locking correctness validator.
[ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0
[ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7
[ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff
[ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420
[ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90
[ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000
[ 90.780033] ...
[ 90.784902] Call Trace:
[ 90.789793] [] show_stack+0xb8/0x148
[ 90.798659] [] register_lock_class+0x270/0x55c
[ 90.809247] [] __lock_acquire+0x13c/0xf7c
[ 90.818964] [] lock_acquire+0x194/0x1dc
[ 90.828345] [] flush_work+0x200/0x24c
[ 90.837374] [] __cancel_work_timer+0x158/0x210
[ 90.847958] [] jffs2_sync_fs+0x20/0x54
[ 90.857173] [] iterate_supers+0xf4/0x120
[ 90.866729] [] sys_sync+0x44/0x9c
[ 90.875067] [] syscall_common+0x34/0x58Signed-off-by: Daniel Santos
Reviewed-by: Hou Tao
Signed-off-by: Boris Brezillon
Signed-off-by: Sasha Levin
23 Jan, 2019
4 commits
-
commit 04906b2f542c23626b0ef6219b808406f8dddbe9 upstream.
bd_set_size() updates also block device's block size. This is somewhat
unexpected from its name and at this point, only blkdev_open() uses this
functionality. Furthermore, this can result in changing block size under
a filesystem mounted on a loop device which leads to livelocks inside
__getblk_gfp() like:Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
...
Call Trace:
init_page_buffers+0x3e2/0x530 fs/buffer.c:904
grow_dev_page fs/buffer.c:947 [inline]
grow_buffers fs/buffer.c:1009 [inline]
__getblk_slow fs/buffer.c:1036 [inline]
__getblk_gfp+0x906/0xb10 fs/buffer.c:1313
__bread_gfp+0x2d/0x310 fs/buffer.c:1347
sb_bread include/linux/buffer_head.h:307 [inline]
fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
fat_ent_read_block fs/fat/fatent.c:441 [inline]
fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
__fat_get_block fs/fat/inode.c:148 [inline]
...Trivial reproducer for the problem looks like:
truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mntFix the problem by moving initialization of a block device block size
into a separate function and call it when needed.Thanks to Tetsuo Handa for help with
debugging the problem.Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 5631e8576a3caf606cdc375f97425a67983b420c upstream.
Yue Hu noticed that when parsing device tree the allocated platform data
was never freed. Since it's not used beyond the function scope, this
switches to using a stack variable instead.Reported-by: Yue Hu
Fixes: 35da60941e44 ("pstore/ram: add Device Tree bindings")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook
Signed-off-by: Greg Kroah-Hartman -
commit 74d5d229b1bf60f93bff244b2dfc0eb21ec32a07 upstream.
If we flip read-only before we initiate writeback on all dirty pages for
ordered extents we've created then we'll have ordered extents left over
on umount, which results in all sorts of bad things happening. Fix this
by making sure we wait on ordered extents if we have to do the aborted
transaction cleanup stuff.generic/475 can produce this warning:
[ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs]
[ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G W 5.0.0-rc1-default+ #394
[ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs]
[ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286
[ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000
[ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000
[ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000
[ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0
[ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000
[ 8531.202707] FS: 00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000
[ 8531.204851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0
[ 8531.207516] Call Trace:
[ 8531.208175] btrfs_free_fs_roots+0xdb/0x170 [btrfs]
[ 8531.210209] ? wait_for_completion+0x5b/0x190
[ 8531.211303] close_ctree+0x157/0x350 [btrfs]
[ 8531.212412] generic_shutdown_super+0x64/0x100
[ 8531.213485] kill_anon_super+0x14/0x30
[ 8531.214430] btrfs_kill_super+0x12/0xa0 [btrfs]
[ 8531.215539] deactivate_locked_super+0x29/0x60
[ 8531.216633] cleanup_mnt+0x3b/0x70
[ 8531.217497] task_work_run+0x98/0xc0
[ 8531.218397] exit_to_usermode_loop+0x83/0x90
[ 8531.219324] do_syscall_64+0x15b/0x180
[ 8531.220192] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 8531.221286] RIP: 0033:0x7fc68e5e4d07
[ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6
[ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07
[ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80
[ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80
[ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80
[ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000Leaving a tree in the rb-tree:
3853 void btrfs_free_fs_root(struct btrfs_root *root)
3854 {
3855 iput(root->ino_cache_inode);
3856 WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));CC: stable@vger.kernel.org
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
[ add stacktrace ]
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman -
commit 77b7aad195099e7c6da11e94b7fa6ef5e6fb0025 upstream.
This reverts commit e73e81b6d0114d4a303205a952ab2e87c44bd279.
This patch causes a few problems:
- adds latency to btrfs_finish_ordered_io
- as btrfs_finish_ordered_io is used for free space cache, generating
more work from btrfs_btree_balance_dirty_nodelay could end up in the
same workque, effectively deadlocking12260 kworker/u96:16+btrfs-freespace-write D
[] balance_dirty_pages+0x6e6/0x7ad
[] balance_dirty_pages_ratelimited+0x6bb/0xa90
[] btrfs_finish_ordered_io+0x3da/0x770
[] normal_work_helper+0x1c5/0x5a0
[] process_one_work+0x1ee/0x5a0
[] worker_thread+0x46/0x3d0
[] kthread+0xf5/0x130
[] ret_from_fork+0x24/0x30
[] 0xffffffffffffffffTransaction commit will wait on the freespace cache:
838 btrfs-transacti D
[] btrfs_start_ordered_extent+0x154/0x1e0
[] btrfs_wait_ordered_range+0xbd/0x110
[] __btrfs_wait_cache_io+0x49/0x1a0
[] btrfs_write_dirty_block_groups+0x10b/0x3b0
[] commit_cowonly_roots+0x215/0x2b0
[] btrfs_commit_transaction+0x37e/0x910
[] transaction_kthread+0x14d/0x180
[] kthread+0xf5/0x130
[] ret_from_fork+0x24/0x30
[] 0xffffffffffffffffAnd then writepages ends up waiting on transaction commit:
9520 kworker/u96:13+flush-btrfs-1 D
[] wait_current_trans+0xac/0xe0
[] start_transaction+0x21b/0x4b0
[] cow_file_range_inline+0x10b/0x6b0
[] cow_file_range.isra.69+0x329/0x4a0
[] run_delalloc_range+0x105/0x3c0
[] writepage_delalloc+0x119/0x180
[] __extent_writepage+0x10c/0x390
[] extent_write_cache_pages+0x26f/0x3d0
[] extent_writepages+0x4f/0x80
[] do_writepages+0x17/0x60
[] __writeback_single_inode+0x59/0x690
[] writeback_sb_inodes+0x291/0x4e0
[] __writeback_inodes_wb+0x87/0xb0
[] wb_writeback+0x3bb/0x500
[] wb_workfn+0x40d/0x610
[] process_one_work+0x1ee/0x5a0
[] worker_thread+0x1e0/0x3d0
[] kthread+0xf5/0x130
[] ret_from_fork+0x24/0x30
[] 0xffffffffffffffffEventually, we have every process in the system waiting on
balance_dirty_pages(), and nobody is able to make progress on page
writeback.The original patch tried to fix an OOM condition, that happened on 4.4 but no
success reproducing that on later kernels (4.19 and 4.20). This is more likely
a problem in OOM itself.Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
Reported-by: Chris Mason
CC: stable@vger.kernel.org # 4.18+
CC: ethanlien
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman
17 Jan, 2019
8 commits
-
commit 95cb67138746451cc84cf8e516e14989746e93b0 upstream.
We already using mapping_set_error() in fs/ext4/page_io.c, so all we
need to do is to use file_check_and_advance_wb_err() when handling
fsync() requests in ext4_sync_file().Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a upstream.
In no-journal mode, we previously used __generic_file_fsync() in
no-journal mode. This triggers a lockdep warning, and in addition,
it's not safe to depend on the inode writeback mechanism in the case
ext4. We can solve both problems by calling ext4_write_inode()
directly.Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit e86807862e6880809f191c4cea7f88a489f0ed34 upstream.
The xfstests generic/475 test switches the underlying device with
dm-error while running a stress test. This results in a large number
of file system errors, and since we can't lock the buffer head when
marking the superblock dirty in the ext4_grp_locked_error() case, it's
possible the superblock to be !buffer_uptodate() without
buffer_write_io_error() being true.We need to set buffer_uptodate() before we call mark_buffer_dirty() or
this will trigger a WARN_ON. It's safe to do this since the
superblock must have been properly read into memory or the mount would
have been successful. So if buffer_uptodate() is not set, we can
safely assume that this happened due to a failed attempt to write the
superblock.Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit 2b08b1f12cd664dc7d5c84ead9ff25ae97ad5491 upstream.
The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent()
while still holding the xattr semaphore. This is not necessary and it
triggers a circular lockdep warning. This is because
fiemap_fill_next_extent() could trigger a page fault when it writes
into page which triggers a page fault. If that page is mmaped from
the inline file in question, this could very well result in a
deadlock.This problem can be reproduced using generic/519 with a file system
configuration which has the inline_data feature enabled.Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit 812c0cab2c0dfad977605dbadf9148490ca5d93f upstream.
There are enough credits reserved for most dioread_nolock writes;
however, if the extent tree is sufficiently deep, and/or quota is
enabled, the code was not allowing for all eventualities when
reserving journal credits for the unwritten extent conversion.This problem can be seen using xfstests ext4/034:
WARNING: CPU: 1 PID: 257 at fs/ext4/ext4_jbd2.c:271 __ext4_handle_dirty_metadata+0x10c/0x180
Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
RIP: 0010:__ext4_handle_dirty_metadata+0x10c/0x180
...
EXT4-fs: ext4_free_blocks:4938: aborting transaction: error 28 in __ext4_handle_dirty_metadata
EXT4: jbd2_journal_dirty_metadata failed: handle type 11 started at line 4921, credits 4/0, errcode -28
EXT4-fs error (device dm-1) in ext4_free_blocks:4950: error 28Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit b9a74cde94957d82003fb9f7ab4777938ca851cd upstream.
If maxBuf is small but non-zero, it could result in a zero sized lock
element array which we would then try and access OOB.Signed-off-by: Ross Lagerwall
Signed-off-by: Steve French
CC: Stable
Signed-off-by: Greg Kroah-Hartman -
commit ee13919c2e8d1f904e035ad4b4239029a8994131 upstream.
Currently we hide EINTR code returned from sock_sendmsg()
and return 0 instead. This makes a caller think that we
successfully completed the network operation which is not
true. Fix this by properly returning EINTR to callers.Cc:
Signed-off-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman -
commit b983f7e92348d7e7d091db1b78b7915e9dd3d63a upstream.
Currently for MTU requests we allocate maximum possible credits
in advance and then adjust them according to the request size.
While we were adjusting the number of credits belonging to the
server, we were skipping adjustment of credits belonging to the
request. This patch fixes it by setting request credits to
CreditCharge field value of SMB2 packet header.Also ask 1 credit more for async read and write operations to
increase parallelism and match the behavior of other operations.Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French
CC: Stable
Signed-off-by: Greg Kroah-Hartman
13 Jan, 2019
6 commits
-
commit 3c1392d4c49962a31874af14ae9ff289cb2b3851 upstream.
Updating mseq makes client think importer mds has accepted all prior
cap messages and importer mds knows what caps client wants. Actually
some cap messages may have been dropped because of mseq mismatch.If mseq is left untouched, importing cap's mds_wanted later will get
reset by cap import message.Cc: stable@vger.kernel.org
Signed-off-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman -
commit b8eee0e90f9797b747113638bc75e739b192ad38 upstream.
Commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid
for remote locks") specified that the l_pid returned for F_GETLK on a local
file that has a remote lock should be the pid of the lock manager process.
That commit, while updating other filesystems, failed to update lockd, such
that locks created by lockd had their fl_pid set to that of the remote
process holding the lock. Fix that here to be the pid of lockd.Also, fix the client case so that the returned lock pid is negative, which
indicates a remote lock on a remote file.Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...")
Cc: stable@vger.kernel.orgSigned-off-by: Benjamin Coddington
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman -
commit 2d29f6b96d8f80322ed2dd895bca590491c38d34 upstream.
Fix the resource group wrap-around logic in gfs2_rbm_find that commit
e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the
same bitmaps; there is a risk that future changes will turn this into an
endless loop.Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson
Signed-off-by: Greg Kroah-Hartman -
commit 6ff9b09e00a441599f3aacdf577254455a048bc9 upstream.
In gfs2_create_inode, after setting and releasing the acl / default_acl, the
acl / default_acl pointers are not set to NULL as they should be. In that
state, when the function reaches label fail_free_acls, gfs2_create_inode will
try to release the same acls again.Fix that by setting the pointers to NULL after releasing the acls. Slightly
simplify the logic. Also, posix_acl_release checks for NULL already, so
there is no need to duplicate those checks here.Fixes: e01580bf9e4d ("gfs2: use generic posix ACL infrastructure")
Reported-by: Pan Bian
Cc: Christoph Hellwig
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson
Signed-off-by: Greg Kroah-Hartman -
commit d47b41aceeadc6b58abc9c7c6485bef7cfb75636 upstream.
According to comment in dlm_user_request() ua should be freed
in dlm_free_lkb() after successful attach to lkb.However ua is attached to lkb not in set_lock_args() but later,
inside request_lock().Fixes 597d0cae0f99 ("[DLM] dlm: user locks")
Cc: stable@kernel.org # 2.6.19Signed-off-by: Vasily Averin
Signed-off-by: David Teigland
Signed-off-by: Greg Kroah-Hartman -
commit c0174726c3976e67da8649ac62cae43220ae173a upstream.
Fixes 6d40c4a708e0 ("dlm: improve error and debug messages")
Cc: stable@kernel.org # 3.5Signed-off-by: Vasily Averin
Signed-off-by: David Teigland
Signed-off-by: Greg Kroah-Hartman