Doug / smarc-fsl-linux-kernel | Embedian Git Server

27 May, 2013

2 commits

89ff77837 Merge tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:

- Stable fix to prevent an rpc_task wakeup race
- Fix a NFSv4.1 session drain deadlock
- Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
- Ensure auth_gss pipe detection works in namespaces
- Fix SETCLIENTID fallback if rpcsec_gss is not available

* tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix SETCLIENTID fallback if GSS is not available
SUNRPC: Prevent an rpc_task wakeup race
NFSv4.1 Fix a pNFS session draining deadlock
SUNRPC: Convert auth_gss pipe detection to work in namespaces
SUNRPC: Faster detection if gssd is actually running
SUNRPC: Fix a bug in gss_create_upcall

Linus Torvalds
2013-05-27 03:33:05 +0800
088d812fe Merge tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs fixes from Ben Myers:
"Here are fixes for corruption on 512 byte filesystems, a rounding
error, a use-after-free, some flags to fix lockdep reports, and
several fixes related to CRCs. We have a somewhat larger post -rc1
queue than usual due to fixes related to the CRC feature we merged for
3.10:

- Fix for corruption with FSX on 512 byte blocksize filesystems
- Fix rounding error in xfs_free_file_space
- Fix use-after-free with extent free intents
- Add several missing KM_NOFS flags to fix lockdep reports
- Several fixes for CRC related code"

* tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs:
xfs: remote attribute lookups require the value length
xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
xfs: fix missing KM_NOFS tags to keep lockdep happy
xfs: Don't reference the EFI after it is freed
xfs: fix rounding in xfs_free_file_space
xfs: fix sub-page blocksize data integrity writes

Linus Torvalds
2013-05-27 00:35:02 +0800

25 May, 2013

15 commits

03e04f048 aio: fix kioctx not being freed after cancellation at exit time ... Browse Code »

The recent changes overhauling fs/aio.c introduced a bug that results in
the kioctx not being freed when outstanding kiocbs are cancelled at
exit_aio() time. Specifically, a kiocb that is cancelled has its
completion events discarded by batch_complete_aio(), which then fails to
wake up the process stuck in free_ioctx(). Fix this by modifying the
wait_event() condition in free_ioctx() appropriately.

This patch was tested with the cancel operation in the thread based code
posted yesterday.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Benjamin LaHaise
Signed-off-by: Kent Overstreet
Cc: Kent Overstreet
Cc: Josh Boyer
Cc: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin LaHaise
2013-05-25 07:22:53 +0800
b4ca2b4b5 ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap() ... Browse Code »

Last time we found there is lock/unlock bug in ocfs2_file_aio_write, and
then we did a thorough search for all lock resources in
ocfs2_inode_info, including rw, inode and open lockres and found this
bug. My kernel version is 3.0.13, and it is also in the lastest version
3.9. In ocfs2_fiemap, once ocfs2_get_clusters_nocache failed, it should
goto out_unlock instead of out, because we need release buffer head, up
read alloc sem and unlock inode.

Signed-off-by: Joseph Qi
Reviewed-by: Jie Liu
Cc: Mark Fasheh
Cc: Joel Becker
Acked-by: Sunil Mushran
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2013-05-25 07:22:52 +0800
136e8770c nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary ... Browse Code »

nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary

DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.

The issue was reported by Anthony Doggett :
"The machine I've been trialling nilfs on is running Debian Testing,
Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
also reproduced it (identically) with Debian Unstable amd64 and Debian
Experimental (using the 3.8-trunk kernel). The problematic partitions
were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."

SYMPTOMS:
(1) System log contains error messages likewise:

[63102.496756] nilfs_direct_assign: invalid pointer: 0
[63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
[63102.496798]
[63102.524403] Remounting filesystem read-only

(2) The NILFS2 file system is remounted in RO mode.

REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett ):

----------------[BEGIN SCRIPT]--------------------

VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------

REPRODUCIBILITY: 100%

INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:

open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(7, 60)

The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes. As
it is possible to see from above output, the file has 60 bytes of the
whole size. So, it is enough one block (1 KB in size) allocation for
the whole file. Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.

The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.

When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().

The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called. However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.

As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.

FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.

Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.

Signed-off-by: Ryusuke Konishi
Reported-by: Anthony Doggett
Reported-by: Vyacheslav Dubeyko
Cc: Vyacheslav Dubeyko
Tested-by: Ryusuke Konishi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ryusuke Konishi
2013-05-25 07:22:52 +0800
6900807c6 aio: fix io_getevents documentation ... Browse Code »

In reviewing man pages, I noticed that io_getevents is documented to
update the timeout that gets passed into the library call. This doesn't
happen in kernel space or in the library (even though it's documented to
do so in both places). Unless there is objection, I'd like to fix the
comments/docs to match the code (I will also update the man page upon
consensus).

Signed-off-by: Jeff Moyer
Signed-off-by: Benjamin LaHaise
Acked-by: Cyril Hrubis
Acked-by: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
2013-05-25 07:22:52 +0800
fb09c3733 hfs: avoid crash in hfs_bnode_create ... Browse Code »

Commit 634725a92938 ("hfs: cleanup HFS+ prints") removed the BUG_ON in
hfs_bnode_create in hfsplus. This patch removes it from the hfs version
and avoids an fsfuzzer crash.

Signed-off-by: Jeff Mahoney
Acked-by: Jeff Mahoney
Signed-off-by: Jiri Slaby
Cc: Vyacheslav Dubeyko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Mahoney
2013-05-25 07:22:51 +0800
afe1bb73f ocfs2: unlock rw lock if inode lock failed ... Browse Code »

In ocfs2_file_aio_write(), it does ocfs2_rw_lock() first and then
ocfs2_inode_lock().

But if ocfs2_inode_lock() failed, it goes to out_sems without unlocking
rw lock. This will cause a bug in ocfs2_lock_res_free() when testing
res->l_ex_holders, which is increased in __ocfs2_cluster_lock() and
decreased in __ocfs2_cluster_unlock().

Signed-off-by: Joseph Qi
Cc: Joel Becker
Cc: Mark Fasheh
Cc: Li Zefan
Cc: "Duyongfeng (B)"
Acked-by: Sunil Mushran
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2013-05-25 07:22:51 +0800
7b92d03c3 fat: fix possible overflow for fat_clusters ... Browse Code »

Intermediate value of fat_clusters can be overflowed on 32bits arch.

Reported-by: Krzysztof Strasburger
Signed-off-by: OGAWA Hirofumi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2013-05-25 07:22:50 +0800
7ae077802 xfs: remote attribute lookups require the value length ... Browse Code »

When reading a remote attribute, to correctly calculate the length
of the data buffer for CRC enable filesystems, we need to know the
length of the attribute data. We get this information when we look
up the attribute, but we don't store it in the args structure along
with the other remote attr information we get from the lookup. Add
this information to the args structure so we can use it
appropriately.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

(cherry picked from commit e461fcb194172b3f709e0b478d2ac1bdac7ab9a3)

Dave Chinner
2013-05-25 05:31:20 +0800
cf257abf0 xfs: xfs_attr_shortform_allfit() does not handle attr3 format. ... Browse Code »

xfstests generic/117 fails with:

XFS: Assertion failed: leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC)

indicating a function that does not handle the attr3 format
correctly. Fix it.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers
(cherry picked from commit b38958d715316031fe9ea0cc6c22043072a55f49)

Dave Chinner
2013-05-25 05:29:56 +0800
7ced60cae xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC ... Browse Code »

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

(cherry picked from commit 72916fb8cbcf0c2928f56cdc2fbe8c7bf5517758)

Dave Chinner
2013-05-25 05:29:37 +0800
b17cb364d xfs: fix missing KM_NOFS tags to keep lockdep happy ... Browse Code »

There are several places where we use KM_SLEEP allocation contexts
and use the fact that they are called from transaction context to
add KM_NOFS where appropriate. Unfortunately, there are several
places where the code makes this assumption but can be called from
outside transaction context but with filesystem locks held. These
places need explicit KM_NOFS annotations to avoid lockdep
complaining about reclaim contexts.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

(cherry picked from commit ac14876cf9255175bf3bdad645bf8aa2b8fb2d7c)

Dave Chinner
2013-05-25 05:29:15 +0800
509e708a8 xfs: Don't reference the EFI after it is freed ... Browse Code »

Checking the EFI for whether it is being released from recovery
after we've already released the known active reference is a mistake
worthy of a brown paper bag. Fix the (now) obvious use after free
that it can cause.

Reported-by: Dave Jones
Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Ben Myers

(cherry picked from commit 52c24ad39ff02d7bd73c92eb0c926fb44984a41d)

Dave Chinner
2013-05-25 05:27:57 +0800
7031d0e1c xfs: fix rounding in xfs_free_file_space ... Browse Code »

The offset passed into xfs_free_file_space() needs to be rounded
down to a certain size, but the rounding mask is built by a 32 bit
variable. Hence the mask will always mask off the upper 32 bits of
the offset and lead to incorrect writeback and invalidation ranges.

This is not actually exposed as a bug because we writeback and
invalidate from the rounded offset to the end of the file, and hence
the offset we are actually punching a hole out of will always be
covered by the code. This needs fixing, however, if we ever want to
use exact ranges for writeback/invalidation here...

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Ben Myers

(cherry picked from commit 28ca489c63e9aceed8801d2f82d731b3c9aa50f5)

Dave Chinner
2013-05-25 05:27:41 +0800
480d7467e xfs: fix sub-page blocksize data integrity writes ... Browse Code »

FSX on 512 byte block size filesystems has been failing for some
time with corrupted data. The fault dates back to the change in
the writeback data integrity algorithm that uses a mark-and-sweep
approach to avoid data writeback livelocks.

Unfortunately, a side effect of this mark-and-sweep approach is that
each page will only be written once for a data integrity sync, and
there is a condition in writeback in XFS where a page may require
two writeback attempts to be fully written. As a result of the high
level change, we now only get a partial page writeback during the
integrity sync because the first pass through writeback clears the
mark left on the page index to tell writeback that the page needs
writeback....

The cause is writing a partial page in the clustering code. This can
happen when a mapping boundary falls in the middle of a page - we
end up writing back the first part of the page that the mapping
covers, but then never revisit the page to have the remainder mapped
and written.

The fix is simple - if the mapping boundary falls inside a page,
then simple abort clustering without touching the page. This means
that the next ->writepage entry that write_cache_pages() will make
is the page we aborted on, and xfs_vm_writepage() will map all
sections of the page correctly. This behaviour is also optimal for
non-data integrity writes, as it results in contiguous sequential
writeback of the file rather than missing small holes and having to
write them a "random" writes in a future pass.

With this fix, all the fsx tests in xfstests now pass on a 512 byte
block size filesystem on a 4k page machine.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Ben Myers

(cherry picked from commit 49b137cbbcc836ef231866c137d24f42c42bb483)

Dave Chinner
2013-05-25 05:26:51 +0800
a8432588f Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull CIFS fix from Steve French:
"One cifs fix to merge now - fixes possible DFS oops (I expect to
request a merge of 4 additional cifs fixes next week)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: only set ops for inodes in I_NEW state

Linus Torvalds
2013-05-25 01:45:59 +0800

24 May, 2013

5 commits

e97e548ba GFS2: Fix typo in gfs2_log_end_write loop ... Browse Code »

There was a missing _all in this loop iterator

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2013-05-24 20:48:09 +0800
75f96ce6e GFS2: fix DLM depends to fix build errors ... Browse Code »

Fix build errors by correcting DLM dependencies in GFS2.
Build errors happen when CONFIG_GFS2_FS_LOCKING_DLM=y and CONFIG_DLM=m:

fs/built-in.o: In function `gfs2_lock':
file.c:(.text+0xc7abd): undefined reference to `dlm_posix_get'
file.c:(.text+0xc7ad0): undefined reference to `dlm_posix_unlock'
file.c:(.text+0xc7ad9): undefined reference to `dlm_posix_lock'
fs/built-in.o: In function `gdlm_unmount':
lock_dlm.c:(.text+0xd6e5b): undefined reference to `dlm_release_lockspace'
fs/built-in.o: In function `sync_unlock':
lock_dlm.c:(.text+0xd6e9e): undefined reference to `dlm_unlock'
fs/built-in.o: In function `sync_lock':
lock_dlm.c:(.text+0xd6fb6): undefined reference to `dlm_lock'
fs/built-in.o: In function `gdlm_put_lock':
lock_dlm.c:(.text+0xd7238): undefined reference to `dlm_unlock'
fs/built-in.o: In function `gdlm_mount':
lock_dlm.c:(.text+0xd753e): undefined reference to `dlm_new_lockspace'
lock_dlm.c:(.text+0xd79d3): undefined reference to `dlm_release_lockspace'
fs/built-in.o: In function `gdlm_lock':
lock_dlm.c:(.text+0xd8179): undefined reference to `dlm_lock'
fs/built-in.o: In function `gdlm_cancel':
lock_dlm.c:(.text+0xd6b22): undefined reference to `dlm_unlock'

Signed-off-by: Randy Dunlap
Signed-off-by: Steven Whitehouse

Randy Dunlap
2013-05-24 20:47:53 +0800
af21ca8ed GFS2: Use single-block reservations for directories ... Browse Code »

This patch changes the multi-block allocation code, such that
directory inodes only get a single block reserved in the bitmap.
That way, the bitmaps are more tightly packed together, and there
are fewer spans of free blocks for in-use block reservations.
This means it takes less time to find a free span of blocks in the
bitmap, which speeds things up. This increases the performance of
some workloads by almost 2X. In Nate's mockup.py script (which does
(1) create dir, (2) create dir in dir, (3) create file in that dir)
the test executes in 23 steps rather than 43 steps, a 47%
performance improvement.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2013-05-24 20:47:32 +0800
37f715774 GFS2: two minor quota fixups ... Browse Code »

This patch fixes two regression problems that Abhi found in the
GFS2 quota code.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse

Bob Peterson
2013-05-24 20:47:13 +0800
83c168bf8 NFS: Fix SETCLIENTID fallback if GSS is not available ... Browse Code »

Commit 79d852bf "NFS: Retry SETCLIENTID with AUTH_SYS instead of
AUTH_NONE" did not take into account commit 23631227 "NFSv4: Fix the
fallback to AUTH_NULL if krb5i is not available".

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-05-24 06:50:40 +0800

21 May, 2013

1 commit

774d5f14e NFSv4.1 Fix a pNFS session draining deadlock ... Browse Code »

On a CB_RECALL the callback service thread flushes the inode using
filemap_flush prior to scheduling the state manager thread to return the
delegation. When pNFS is used and I/O has not yet gone to the data server
servicing the inode, a LAYOUTGET can preceed the I/O. Unlike the async
filemap_flush call, the LAYOUTGET must proceed to completion.

If the state manager starts to recover data while the inode flush is sending
the LAYOUTGET, a deadlock occurs as the callback service thread holds the
single callback session slot until the flushing is done which blocks the state
manager thread, and the state manager thread has set the session draining bit
which puts the inode flush LAYOUTGET RPC to sleep on the forechannel slot
table waitq.

Separate the draining of the back channel from the draining of the fore channel
by moving the NFS4_SESSION_DRAINING bit from session scope into the fore
and back slot tables. Drain the back channel first allowing the LAYOUTGET
call to proceed (and fail) so the callback service thread frees the callback
slot. Then proceed with draining the forechannel.

Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust

Andy Adamson
2013-05-21 02:20:14 +0800

19 May, 2013

1 commit

130901ba3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"Miao Xie has been very busy, fixing races and enospc problems and many
other small but important pieces.

Alexandre Oliva discovered some problems with how our error handling
was interacting with the block layer and for now has disabled our
partial handling of sub-page writes. The real sub-page work is in a
series of patches from IBM that we still need to integrate and test.
The code Alexandre has turned off was really incomplete.

Josef has more error handling fixes and an important fix for the new
skinny extent format.

This also has my fix for the tracepoint crash from late in 3.9. It's
the first stage in a larger clean up to get rid of btrfs_bio and make
a proper bioset for all the items we need to tack into the bio. For
now the bioset only holds our mirror_num and stripe_index, but for the
next merge window I'll shuffle more in."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
Btrfs: use a btrfs bioset instead of abusing bio internals
Btrfs: make sure roots are assigned before freeing their nodes
Btrfs: explicitly use global_block_rsv for quota_tree
btrfs: do away with non-whole_page extent I/O
Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock context
Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix()
Btrfs: pause the space balance when remounting to R/O
Btrfs: fix unprotected root node of the subvolume's inode rb-tree
Btrfs: fix accessing a freed tree root
Btrfs: return errno if possible when we fail to allocate memory
Btrfs: update the global reserve if it is empty
Btrfs: don't steal the reserved space from the global reserve if their space type is different
Btrfs: optimize the error handle of use_block_rsv()
Btrfs: don't use global block reservation for inode cache truncation
Btrfs: don't abort the current transaction if there is no enough space for inode cache
Correct allowed raid levels on balance.
Btrfs: fix possible memory leak in replace_path()
Btrfs: fix possible memory leak in the find_parent_nodes()
Btrfs: don't allow device replace on RAID5/RAID6
Btrfs: handle running extent ops with skinny metadata
...

Linus Torvalds
2013-05-19 02:35:28 +0800

18 May, 2013

16 commits

c5cb6a057 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next Browse Code »

Chris Mason
2013-05-18 09:53:17 +0800
9be3395bc Btrfs: use a btrfs bioset instead of abusing bio internals ... Browse Code »

Btrfs has been pointer tagging bi_private and using bi_bdev
to store the stripe index and mirror number of failed IOs.

As bios bubble back up through the call chain, we use these
to decide if and how to retry our IOs. They are also used
to count IO failures on a per device basis.

Recently a bio tracepoint was added lead to crashes because
we were abusing bi_bdev.

This commit adds a btrfs bioset, and creates explicit fields
for the mirror number and stripe index. The plan is to
extend this structure for all of the fields currently in
struct btrfs_bio, which will mean one less kmalloc in
our IO path.

Signed-off-by: Chris Mason
Reported-by: Tejun Heo

Chris Mason
2013-05-18 09:52:52 +0800
655b09fe5 Btrfs: make sure roots are assigned before freeing their nodes ... Browse Code »

If we fail to load the chunk tree we'll call free_root_pointers, except we may
not have assigned the roots for the dev_root/extent_root/csum_root yet, so we
could NULL pointer deref at this point. Just add checks to make sure these
roots are set to keep us from panicing. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2013-05-18 09:40:38 +0800
3a6cad900 Btrfs: explicitly use global_block_rsv for quota_tree ... Browse Code »

The quota_tree was set up to use the empty_block_rsv before
which would be problematic when the filesystem is filled up
and ENOSPC happens during internal operations while the quota
tree is updated and COWed (when the btrfs_qgroup_info_item
items) are written. In fact, use_block_rsv() which is used
in btrfs_cow_block() falls back to the global_block_rsv in
this case. But just in order to make it more clear what is
happening, change it to explicitly use the global_block_rsv.

Signed-off-by: Stefan Behrens
Signed-off-by: Josef Bacik

Stefan Behrens
2013-05-18 09:40:36 +0800
17a5adccf btrfs: do away with non-whole_page extent I/O ... Browse Code »

end_bio_extent_readpage computes whole_page based on bv_offset and
bv_len, without taking into account that blk_update_request may modify
them when some of the blocks to be read into a page produce a read
error. This would cause the read to unlock only part of the file
range associated with the page, which would in turn leave the entire
page locked, which would not only keep the process blocked instead of
returning -EIO to it, but also prevent any further access to the file.

It turns out that btrfs always issues whole-page reads and writes.
The special handling of non-whole_page appears to be a mistake or a
left-over from a time when this wasn't the case. Indeed,
end_bio_extent_writepage distinguished between whole_page and
non-whole_page writes but behaved identically in both cases!

I've replaced the whole_page computations with warnings, just to be
sure that we're not issuing partial page reads or writes. The
warnings should probably just go away some time.

Signed-off-by: Alexandre Oliva
Signed-off-by: Josef Bacik

Alexandre Oliva
2013-05-18 09:40:35 +0800
b216cbfb5 Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock context ... Browse Code »

btrfs_invalidate_inodes() may sleep, so we should not invoke it in the
spin lock context. Fix it.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:34 +0800
314297c2a Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix() ... Browse Code »

We have checked if ->node is NULL or not, so it is unnecessary to
use BUG_ON() to check again. Remove it.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:32 +0800
061594ef1 Btrfs: pause the space balance when remounting to R/O ... Browse Code »

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:31 +0800
e1409cef8 Btrfs: fix unprotected root node of the subvolume's inode rb-tree ... Browse Code »

The root node of the rb-tree may be changed, so we should get it under
the lock. Fix it.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:30 +0800
89042e5ad Btrfs: fix accessing a freed tree root ... Browse Code »

inode_tree_del() will move the tree root into the dead root list, and
then the tree will be destroyed by the cleaner. So if we remove the
delayed node which is cached in the inode after inode_tree_del(),
we may access a freed tree root. Fix it.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:29 +0800
b9aa55bed Btrfs: return errno if possible when we fail to allocate memory ... Browse Code »

We need to set return value explicitly, otherwise we'll lose the error
value.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik

Liu Bo
2013-05-18 09:40:27 +0800
d88033dbf Btrfs: update the global reserve if it is empty ... Browse Code »

Before applying this patch, we reserved the space for the global reserve
by the minimum unit if we found it is empty, it was unreasonable and
inefficient, because if the global reserve space was depleted, it implied
that the size of the global reserve was too small. In this case, we shoud
update the global reserve and fill it.

Cc: Tsutomu Itoh
Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:26 +0800
5881cfc92 Btrfs: don't steal the reserved space from the global reserve if their space type is different ... Browse Code »

If the type of the space we need is different with the global reserve, we
can not steal the space from the global reserve, because we can not allocate
the space from the free space cache that the global reserve points to.

Cc: Tsutomu Itoh
Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:25 +0800
b586b3237 Btrfs: optimize the error handle of use_block_rsv() ... Browse Code »

cc: Tsutomu Itoh
Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:24 +0800
7b61cd922 Btrfs: don't use global block reservation for inode cache truncation ... Browse Code »

It is very likely that there are lots of subvolumes/snapshots in the filesystem,
so if we use global block reservation to do inode cache truncation, we may hog
all the free space that is reserved in global rsv. So it is better that we do
the free space reservation for inode cache truncation by ourselves.

Cc: Tsutomu Itoh
Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:22 +0800
7cfa9e51d Btrfs: don't abort the current transaction if there is no enough space for inode cache ... Browse Code »

The filesystem with inode cache was forced to be read-only when we umounted it.

Steps to reproduce:
# mkfs.btrfs -f ${DEV}
# mount -o inode_cache ${DEV} ${MNT}
# dd if=/dev/zero of=${MNT}/file1 bs=1M count=8192
# btrfs fi syn ${MNT}
# dd if=${MNT}/file1 of=/dev/null bs=1M
# rm -f ${MNT}/file1
# btrfs fi syn ${MNT}
# umount ${MNT}

It is because there was no enough space to do inode cache truncation, and then
we aborted the current transaction.

But no space error is not a serious problem when we write out the inode cache,
and it is safe that we just skip this step if we meet this problem. So we need
not abort the current transaction.

Reported-by: Tsutomu Itoh
Signed-off-by: Miao Xie
Tested-by: Tsutomu Itoh
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:21 +0800