07 Mar, 2015
15 commits
-
commit 23b133bdc452aa441fcb9b82cbf6dd05cfd342d0 upstream.
Check length of extended attributes and allocation descriptors when
loading inodes from disk. Otherwise corrupted filesystems could confuse
the code and make the kernel oops.Reported-by: Carl Henrik Lunde
Signed-off-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman -
commit 79144954278d4bb5989f8b903adcac7a20ff2a5a upstream.
Store blocksize in a local variable in udf_fill_inode() since it is used
a lot of times.Signed-off-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman -
commit d8ba1f971497c19cf80da1ea5391a46a5f9fbd41 upstream.
If the call to decode_rc_list() fails due to a memory allocation error,
then we need to truncate the array size to ensure that we only call
kfree() on those pointer that were allocated.Reported-by: David Ramos
Fixes: 4aece6a19cf7f ("nfs41: cb_sequence xdr implementation")
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman -
commit ea7c38fef0b774a5dc16fb0ca5935f0ae8568176 upstream.
If we have to do a return-on-close in the delegreturn code, then
we must ensure that the inode and super block remain referenced.Cc: Peng Tao
Signed-off-by: Trond Myklebust
Reviewed-by: Peng Tao
Signed-off-by: Greg Kroah-Hartman -
commit 03a9a42a1a7e5b3e7919ddfacc1d1cc81882a955 upstream.
Fix an Oopsable condition when nsm_mon_unmon is called as part of the
namespace cleanup, which now apparently happens after the utsname
has been freed.Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home
Reported-by: Bruno Prémont
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman -
commit cb5d04bc39e914124e811ea55f3034d2379a5f6c upstream.
With pgio refactoring in v3.15, .init_read and .init_write can be
called with valid pgio->pg_lseg. file layout was fixed at that time
by commit c6194271f (pnfs: filelayout: support non page aligned
layouts). But the generic helper still needs to be fixed.Signed-off-by: Peng Tao
Signed-off-by: Greg Kroah-Hartman -
commit f4086a3d789dbe18949862276d83b8f49fce6d2f upstream.
Commit 411a99adffb4f (nfs: clear_request_commit while holding i_lock)
assumes that the nfs_commit_info always points to the inode->i_lock.
For historical reasons, that is not the case for O_DIRECT writes.Cc: Weston Andros Adamson
Fixes: 411a99adffb4f ("nfs: clear_request_commit while holding i_lock")
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman -
commit 6ffa30d3f734d4f6b478081dfc09592021028f90 upstream.
Bruce reported seeing this warning pop when mounting using v4.1:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0()
do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait+0x2f/0x90
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi
CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78
0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da
ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000
Call Trace:
[] dump_stack+0x4c/0x65
[] warn_slowpath_common+0x8a/0xc0
[] warn_slowpath_fmt+0x55/0x70
[] ? prepare_to_wait+0x2f/0x90
[] ? prepare_to_wait+0x2f/0x90
[] __might_sleep+0xbd/0xd0
[] kmem_cache_alloc_trace+0x243/0x430
[] ? groups_alloc+0x3e/0x130
[] groups_alloc+0x3e/0x130
[] svcauth_unix_accept+0x16e/0x290 [sunrpc]
[] svc_authenticate+0xe1/0xf0 [sunrpc]
[] svc_process_common+0x244/0x6a0 [sunrpc]
[] bc_svc_process+0x1c4/0x260 [sunrpc]
[] nfs41_callback_svc+0x128/0x1f0 [nfsv4]
[] ? wait_woken+0xc0/0xc0
[] ? nfs4_callback_svc+0x60/0x60 [nfsv4]
[] kthread+0x11f/0x140
[] ? local_clock+0x15/0x30
[] ? kthread_create_on_node+0x250/0x250
[] ret_from_fork+0x7c/0xb0
[] ? kthread_create_on_node+0x250/0x250
---[ end trace 675220a11e30f4f2 ]---nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE,
which is just wrong. Fix that by finishing the wait immediately if we've
found that the list has something on it.Also, we don't expect this kthread to accept signals, so we should be
using a TASK_UNINTERRUPTIBLE sleep instead. That however, opens us up
hung task warnings from the watchdog, so have the schedule_timeout
wake up every 60s if there's no callback activity.Reported-by: "J. Bruce Fields"
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman -
commit 05fbf357d94152171bc50f8a369390f1f16efd89 upstream.
Lockless access to pte in pagemap_pte_range() might race with page
migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page():CPU A (pagemap) CPU B (migration)
lock_page()
try_to_unmap(page, TTU_MIGRATION...)
make_migration_entry()
set_pte_at()pte_to_pagemap_entry()
remove_migration_ptes()
unlock_page()
if(is_migration_entry())
migration_entry_to_page()
BUG_ON(!PageLocked(page))Also lockless read might be non-atomic if pte is larger than wordsize.
Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes.Fixes: 052fb0d635df ("proc: report file/anon bit in /proc/pid/pagemap")
Signed-off-by: Konstantin Khlebnikov
Reported-by: Andrey Ryabinin
Reviewed-by: Cyrill Gorcunov
Acked-by: Naoya Horiguchi
Acked-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman -
commit e9892d3cc853afdda2cc69e2576d9ddb5fafad71 upstream.
The commit 2d3d0c5 ("xfs: lobotomise xfs_trans_read_buf_map()") left
a landmine in the tracing code: trace_xfs_trans_buf_read() is now
call on all buffers that are read through this interface rather than
just buffers in transactions. For buffers outside transaction
context, bp->b_fspriv is null, and so the buf log item tracing
functions cannot be called. This causes a NULL pointer dereference
in the trace_xfs_trans_buf_read() function when tracing is turned
on.Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner
Signed-off-by: Greg Kroah-Hartman -
commit 3443a3bca54588f43286b725d8648d33a38c86f1 upstream.
When the superblock is modified in a transaction, the commonly
modified fields are not actually copied to the superblock buffer to
avoid the buffer lock becoming a serialisation point. However, there
are some other operations that modify the superblock fields within
the transaction that don't directly log to the superblock but rely
on the changes to be applied during the transaction commit (to
minimise the buffer lock hold time).When we do this, we fail to mark the buffer log item as being a
superblock buffer and that can lead to the buffer not being marked
with the corect type in the log and hence causing recovery issues.
Fix it by setting the type correctly, similar to xfs_mod_sb()...Tested-by: Jan Kara
Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner
Signed-off-by: Greg Kroah-Hartman -
commit fe22d552b82d7cc7de1851233ae8bef579198637 upstream.
Conversion from local to extent format does not set the buffer type
correctly on the new extent buffer when a symlink data is moved out
of line.Fix the symlink code and leave a comment in the generic bmap code
reminding us that the format-specific data copy needs to set the
destination buffer type appropriately.Tested-by: Jan Kara
Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner
Signed-off-by: Greg Kroah-Hartman -
commit f19b872b086711bb4b22c3a0f52f16aa920bcc61 upstream.
This leads to log recovery throwing errors like:
XFS (md0): Mounting V5 Filesystem
XFS (md0): Starting recovery (logdev: internal)
XFS (md0): Unknown buffer type 0!
XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1
ffff8800ffc53800: 58 41 47 49 .....Which is the AGI buffer magic number.
Ensure that we set the type appropriately in both unlink list
addition and removal.Tested-by: Jan Kara
Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner
Signed-off-by: Greg Kroah-Hartman -
commit 0d612fb570b71ea2e49554a770cff4c489018b2c upstream.
Jan Kara reported that log recovery was finding buffers with invalid
types in them. This should not happen, and indicates a bug in the
logging of buffers. To catch this, add asserts to the buffer
formatting code to ensure that the buffer type is in range when the
transaction is committed.We don't set a type on buffers being marked stale - they are not
going to get replayed, the format item exists only for recovery to
be able to prevent replay of the buffer, so the type does not
matter. Hence that needs special casing here.Reported-by: Jan Kara
Tested-by: Jan Kara
Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner
Signed-off-by: Greg Kroah-Hartman -
commit 2d5b86e048780c5efa7f7d9708815555919e7b05 upstream.
As of v3.18, ext4 started rejecting a remount which changes the
journal_checksum option.Prior to that, it was simply ignored; the problem here is that
if someone has this in their fstab for the root fs, now the box
fails to boot properly, because remount of root with the new options
will fail, and the box proceeds with a readonly root.I think it is a little nicer behavior to accept the option, but
warn that it's being ignored, rather than failing the mount,
but that might be a subjective matter...Reported-by: Cónräd
Signed-off-by: Eric Sandeen
Signed-off-by: Theodore Ts'o
Cc: Josh Boyer
Signed-off-by: Greg Kroah-Hartman
09 Feb, 2015
1 commit
-
Pull aio nested sleep annotation from Ben LaHaise,
* git://git.kvack.org/~bcrl/aio-fixes:
aio: annotate aio_read_event_ring for sleep patterns
08 Feb, 2015
1 commit
-
Pull btrfs fix from Chris Mason:
"Forrest Liu tracked down a missing blk_finish_plug in the btrfs
logging code. This isn't a new bug, and it's hard to hit. But, it's
safe enough for inclusion now, and in my for-linus branch"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: add missing blk_finish_plug in btrfs_sync_log()
06 Feb, 2015
1 commit
-
Nilfs2 eventually hangs in a stress test with fsstress program. This
issue was caused by the following deadlock over I_SYNC flag between
nilfs_segctor_thread() and writeback_sb_inodes():nilfs_segctor_thread()
nilfs_segctor_thread_construct()
nilfs_segctor_unlock()
nilfs_dispose_list()
iput()
iput_final()
evict()
inode_wait_for_writeback() * wait for I_SYNC flagwriteback_sb_inodes()
* set I_SYNC flag on inode->i_state
__writeback_single_inode()
do_writepages()
nilfs_writepages()
nilfs_construct_dsync_segment()
nilfs_segctor_sync()
* wait for completion of segment constructor
inode_sync_complete()
* clear I_SYNC flag after __writeback_single_inode() completedwriteback_sb_inodes() calls do_writepages() for dirty inodes after
setting I_SYNC flag on inode->i_state. do_writepages() in turn calls
nilfs_writepages(), which can run segment constructor and wait for its
completion. On the other hand, segment constructor calls iput(), which
can call evict() and wait for the I_SYNC flag on
inode_wait_for_writeback().Since segment constructor doesn't know when I_SYNC will be set, it
cannot know whether iput() will block or not unless inode->i_nlink has a
non-zero count. We can prevent evict() from being called in iput() by
implementing sop->drop_inode(), but it's not preferable to leave inodes
with i_nlink == 0 for long periods because it even defers file
truncation and inode deallocation. So, this instead resolves the
deadlock by calling iput() asynchronously with a workqueue for inodes
with i_nlink == 0.Signed-off-by: Ryusuke Konishi
Cc: Al Viro
Tested-by: Ryusuke Konishi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Feb, 2015
2 commits
-
Add missing blk_finish_plug in btrfs_sync_log()
Signed-off-by: Forrest Liu
Reviewed-by: David Sterba
Signed-off-by: Chris Mason -
Pull cifs fixes from Steve French:
"Three small cifs fixes. One fixes a hang under stress, and the other
two are security related"* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix MUST SecurityFlags filtering
Complete oplock break jobs before closing file handle
cifs: use memzero_explicit to clear stack buffer
04 Feb, 2015
1 commit
-
Under CONFIG_DEBUG_ATOMIC_SLEEP=y, aio_read_event_ring() will throw
warnings like the following due to being called from wait_event
context:WARNING: CPU: 0 PID: 16006 at kernel/sched/core.c:7300 __might_sleep+0x7f/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_event+0x63/0x110
Modules linked in:
CPU: 0 PID: 16006 Comm: aio-dio-fcntl-r Not tainted 3.19.0-rc6-dgc+ #705
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffffffff821c0372 ffff88003c117cd8 ffffffff81daf2bd 000000000000d8d8
ffff88003c117d28 ffff88003c117d18 ffffffff8109beda ffff88003c117cf8
ffffffff821c115e 0000000000000061 0000000000000000 00007ffffe4aa300
Call Trace:
[] dump_stack+0x4c/0x65
[] warn_slowpath_common+0x8a/0xc0
[] warn_slowpath_fmt+0x46/0x50
[] ? prepare_to_wait_event+0x63/0x110
[] ? prepare_to_wait_event+0x63/0x110
[] __might_sleep+0x7f/0x90
[] mutex_lock+0x24/0x45
[] aio_read_events+0x4c/0x290
[] read_events+0x1ec/0x220
[] ? prepare_to_wait_event+0x110/0x110
[] ? hrtimer_get_res+0x50/0x50
[] SyS_io_getevents+0x4d/0xb0
[] system_call_fastpath+0x12/0x17
---[ end trace bde69eaf655a4fea ]---There is not actually a bug here, so annotate the code to tell the
debug logic that everything is just fine and not to fire a false
positive.Signed-off-by: Dave Chinner
Signed-off-by: Benjamin LaHaise
31 Jan, 2015
2 commits
-
Pull btrfs fix from Chris Mason:
"We have one more fix for btrfs in my for-linus branch - this was a bug
in the new raid5/6 scrubbing support"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix raid56 scrub failed in xfstests btrfs/072 -
Pull quota and UDF fix from Jan Kara:
"A fix for UDF to properly free preallocated blocks and a fix for quota
so that Q_GETQUOTA quotactl reports correct numbers for XFS filesystem
(and similarly Q_XGETQUOTA quotactl works properly for other
filesystems)"* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units
udf: Release preallocation on last writeable close
30 Jan, 2015
1 commit
-
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:- Stable fix for a NFSv4.1 Oops on mount
- Stable fix for an O_DIRECT deadlock condition
- Fix an issue with submounted volumes and fake duplicate inode
numbers"* tag 'nfs-for-3.19-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix use of nfs_attr_use_mounted_on_fileid()
NFSv4.1: Fix an Oops in nfs41_walk_client_list
nfs: fix dio deadlock when O_DIRECT flag is flipped
28 Jan, 2015
3 commits
-
Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
tracks space limits and usage in 512-byte blocks. However VFS quotas
track usage in bytes (as some filesystems require that) and we need to
somehow pass this information. Upto now it wasn't a problem because we
didn't do any unit conversion (thus VFS quota routines happily stuck
number of bytes into d_bcount field of struct fd_disk_quota). Only if
you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
/ Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
tried this but reportedly some Samba users hit the problem in practice.
So when we want interfaces compatible we need to fix this.We bite the bullet and define another quota structure used for passing
information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
to have more conversion routines in fs/quota/quota.c and another copying
of quota structure slows down getting of quota information by about 2%
but it seems cleaner than overloading e.g. units of d_bcount to bytes.CC: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara -
Commit 6fb1ca92a640 "udf: Fix race between write(2) and close(2)"
changed the condition when preallocation is released. The idea was that
we don't want to release the preallocation for an inode on close when
there are other writeable file descriptors for the inode. However the
condition was written in the opposite way so we released preallocation
only if there were other writeable file descriptors. Fix the problem by
changing the condition properly.CC: stable@vger.kernel.org
Fixes: 6fb1ca92a6409a9d5b0696447cd4997bc9aaf5a2
Reported-by: Fabian Frederick
Signed-off-by: Jan Kara -
The xfstests btrfs/072 reports uncorrectable read errors in dmesg,
because scrub forgets to use commit_root for parity scrub routine
and scrub attempts to scrub those extents items whose contents are
not fully on disk.To fix it, we just add the @search_commit_root flag back.
Signed-off-by: Gui Hecheng
Signed-off-by: Qu Wenruo
Reviewed-by: Miao Xie
Signed-off-by: Chris Mason
27 Jan, 2015
1 commit
-
If CONFIG_CIFS_WEAK_PW_HASH is not set, CIFSSEC_MUST_LANMAN
and CIFSSEC_MUST_PLNTXT is defined as 0.When setting new SecurityFlags without any MUST flags,
your flags would be overwritten with CIFSSEC_MUST_LANMAN (0).Signed-off-by: Niklas Cassel
Signed-off-by: Steve French
26 Jan, 2015
1 commit
-
Pull vfs fixes from Al Viro:
"A couple of fixes - deadlock in CIFS and build breakage in cris serial
driver (resurfaced f_dentry in there)"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Convert file->f_dentry->d_inode to file_inode()
fix deadlock in cifs_ioctl_clone()
24 Jan, 2015
1 commit
-
Pull btrfs fixes from Chris Mason:
"We have a few fixes in my for-linus branch.Qu Wenruo's batch fix a regression between some our merge window pull
and the inode_cache feature. The rest are smaller bugs"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
btrfs: Fix the bug that fs_info->pending_changes is never cleared.
btrfs: fix state->private cast on 32 bit machines
Btrfs: fix race deleting block group from space_info->ro_bgs list
Btrfs: fix incorrect freeing in scrub_stripe
btrfs: sync ioctl, handle errors after transaction start
22 Jan, 2015
3 commits
-
This function call was being optimized out during nfs_fhget(), leading
to situations where we have a valid fileid but still want to use the
mounted_on_fileid. For example, imagine we have our server configured
like this:server % df
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 9.1G 6.5G 1.9G 78% /
/dev/vdb1 487M 2.3M 456M 1% /exports
/dev/vdc1 487M 2.3M 456M 1% /exports/vol1
/dev/vdd1 487M 2.3M 456M 1% /exports/vol2If our client mounts /exports and tries to do a "chown -R" across the
entire mountpoint, we will get a nasty message warning us about a circular
directory structure. Running chown with strace tells me that each directory
has the same device and inode number:newfstatat(AT_FDCWD, "/nfs/", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0
newfstatat(4, "vol1", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0
newfstatat(4, "vol2", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0With this patch the mounted_on_fileid values are used for st_ino, so the
directory loop warning isn't reported.Signed-off-by: Anna Schumaker
Signed-off-by: Trond Myklebust -
If we start state recovery on a client that failed to initialise correctly,
then we are very likely to Oops.Reported-by: "Mkrtchyan, Tigran"
Link: http://lkml.kernel.org/r/130621862.279655.1421851650684.JavaMail.zimbra@desy.de
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust -
We only support swap file calling nfs_direct_IO. However, application
might be able to get to nfs_direct_IO if it toggles O_DIRECT flag
during IO and it can deadlock because we grab inode->i_mutex in
nfs_file_direct_write(). So return 0 for such case. Then the generic
layer will fall back to buffer IO.Signed-off-by: Peng Tao
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust
21 Jan, 2015
2 commits
-
Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending
changes) will call btrfs_start_transaction() in sync_fs(), to handle
some operations needed to be done in next transaction.However this can cause deadlock if the filesystem is frozen, with the
following sys_r+w output:
[ 143.255932] Call Trace:
[ 143.255936] [] schedule+0x29/0x70
[ 143.255939] [] __sb_start_write+0xb3/0x100
[ 143.255971] [] start_transaction+0x2e6/0x5a0
[btrfs]
[ 143.255992] [] btrfs_start_transaction+0x1b/0x20
[btrfs]
[ 143.256003] [] btrfs_sync_fs+0xca/0xd0 [btrfs]
[ 143.256007] [] sync_fs_one_sb+0x20/0x30
[ 143.256011] [] iterate_supers+0xe1/0xf0
[ 143.256014] [] sys_sync+0x55/0x90
[ 143.256017] [] system_call_fastpath+0x12/0x17
[ 143.256111] Call Trace:
[ 143.256114] [] schedule+0x29/0x70
[ 143.256119] [] rwsem_down_write_failed+0x1c5/0x2d0
[ 143.256123] [] call_rwsem_down_write_failed+0x13/0x20
[ 143.256131] [] thaw_super+0x28/0xc0
[ 143.256135] [] do_vfs_ioctl+0x3f5/0x540
[ 143.256187] [] SyS_ioctl+0x91/0xb0
[ 143.256213] [] system_call_fastpath+0x12/0x17The reason is like the following:
(Holding s_umount)
VFS sync_fs staff:
|- btrfs_sync_fs()
|- btrfs_start_transaction()
|- sb_start_intwrite()
(Waiting thaw_fs to unfreeze)
VFS thaw_fs staff:
thaw_fs()
(Waiting sync_fs to release
s_umount)So deadlock happens.
This can be easily triggered by fstest/generic/068 with inode_cache
mount option.The fix is to check if the fs is frozen, if the fs is frozen, just
return and waiting for the next transaction.Cc: David Sterba
Reported-by: Gui Hecheng
Signed-off-by: Qu Wenruo
[enhanced comment, changed to SB_FREEZE_WRITE]
Signed-off-by: David Sterba
Signed-off-by: Chris Mason -
Fs_info->pending_changes is never cleared since the original code uses
cmpxchg(&fs_info->pending_changes, 0, 0), which will only clear it if
pending_changes is already 0.This will cause a lot of problem when mount it with inode_cache mount
option.
If the btrfs is mounted as inode_cache, pending_changes will always be
1, even when the fs is frozen.Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Chris Mason
20 Jan, 2015
5 commits
-
Commit
c11f1df5003d534fd067f0168bfad7befffb3b5c
requires writers to wait for any pending oplock break handler to
complete before proceeding to write. This is done by waiting on bit
CIFS_INODE_PENDING_OPLOCK_BREAK in cifsFileInfo->flags. This bit is
cleared by the oplock break handler job queued on the workqueue once it
has completed handling the oplock break allowing writers to proceed with
writing to the file.While testing, it was noticed that the filehandle could be closed while
there is a pending oplock break which results in the oplock break
handler on the cifsiod workqueue being cancelled before it has had a
chance to execute and clear the CIFS_INODE_PENDING_OPLOCK_BREAK bit.
Any subsequent attempt to write to this file hangs waiting for the
CIFS_INODE_PENDING_OPLOCK_BREAK bit to be cleared.We fix this by ensuring that we also clear the bit
CIFS_INODE_PENDING_OPLOCK_BREAK when we remove the oplock break handler
from the workqueue.The bug was found by Red Hat QA while testing using ltp's fsstress
command.Signed-off-by: Sachin Prabhu
Acked-by: Shirish Pargaonkar
Signed-off-by: Jeff Layton
Cc: stable@vger.kernel.org
Signed-off-by: Steve French -
When leaving a function use memzero_explicit instead of memset(0) to
clear stack allocated buffers. memset(0) may be optimized away.This particular buffer is highly likely to contain sensitive data which
we shouldn't leak (it's named 'passwd' after all).Signed-off-by: Giel van Schijndel
Acked-by: Herbert Xu
Reported-at: http://www.viva64.com/en/b/0299/
Reported-by: Andrey Karpov
Reported-by: Svyatoslav Razmyslov
Signed-off-by: Steve French -
Suppress the following warning displayed on building 32bit (i686) kernel.
===============================================================================
...
CC [M] fs/btrfs/extent_io.o
fs/btrfs/extent_io.c: In function ‘btrfs_free_io_failure_record’:
fs/btrfs/extent_io.c:2193:13: warning: cast to pointer from integer of
different size [-Wint-to-pointer-cast]
failrec = (struct io_failure_record *)state->private;
...
===============================================================================Signed-off-by: Satoru Takeuchi
Reported-by: Chris Murphy
Signed-off-by: Chris Mason -
When removing a block group we were deleting it from its space_info's
ro_bgs list without the correct protection - the space info's spinlock.
Fix this by doing the list delete while holding the spinlock of the
corresponding space info, which is the correct lock for any operation
on that list.This issue was introduced in the 3.19 kernel by the following change:
Btrfs: move read only block groups onto their own list V2
commit 633c0aad4c0243a506a3e8590551085ad78af82dI ran into a kernel crash while a task was running statfs, which iterates
the space_info->ro_bgs list while holding the space info's spinlock,
and another task was deleting it from the same list, without holding that
spinlock, as part of the block group remove operation (while running the
function btrfs_remove_block_group). This happened often when running the
stress test xfstests/generic/038 I recently made.Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason -
The address that should be freed is not 'ppath' but 'path'.
Signed-off-by: Tsutomu Itoh
Reviewed-by: Miao Xie
Signed-off-by: Chris Mason