Eric Lee / smarc-fsl-linux-kernel

13 Apr, 2016

1 commit

03a8bb0e5 ext4/fscrypto: avoid RCU lookup in d_revalidate ... Browse Code »

As Al pointed, d_revalidate should return RCU lookup before using d_inode.
This was originally introduced by:
commit 34286d666230 ("fs: rcu-walk aware d_revalidate method").

Reported-by: Al Viro
Signed-off-by: Jaegeuk Kim
Cc: Theodore Ts'o
Cc: stable

Jaegeuk Kim
2016-04-13 11:01:35 +0800

11 Apr, 2016

1 commit

9f2394c9b Revert "ext4: allow readdir()'s of large empty directories to be interrupted" ... Browse Code »

This reverts commit 1028b55bafb7611dda1d8fed2aeca16a436b7dff.

It's broken: it makes ext4 return an error at an invalid point, causing
the readdir wrappers to write the the position of the last successful
directory entry into the position field, which means that the next
readdir will now return that last successful entry _again_.

You can only return fatal errors (that terminate the readdir directory
walk) from within the filesystem readdir functions, the "normal" errors
(that happen when the readdir buffer fills up, for example) happen in
the iterorator where we know the position of the actual failing entry.

I do have a very different patch that does the "signal_pending()"
handling inside the iterator function where it is allowable, but while
that one passes all the sanity checks, I screwed up something like four
times while emailing it out, so I'm not going to commit it today.

So my track record is not good enough, and the stars will have to align
better before that one gets committed. And it would be good to get some
review too, of course, since celestial alignments are always an iffy
debugging model.

IOW, let's just revert the commit that caused the problem for now.

Reported-by: Greg Thelen
Cc: Theodore Ts'o
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-04-11 07:52:24 +0800

08 Apr, 2016

1 commit

93061f390 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 bugfixes from Ted Ts'o:
"These changes contains a fix for overlayfs interacting with some
(badly behaved) dentry code in various file systems. These have been
reviewed by Al and the respective file system mtinainers and are going
through the ext4 tree for convenience.

This also has a few ext4 encryption bug fixes that were discovered in
Android testing (yes, we will need to get these sync'ed up with the
fs/crypto code; I'll take care of that). It also has some bug fixes
and a change to ignore the legacy quota options to allow for xfstests
regression testing of ext4's internal quota feature and to be more
consistent with how xfs handles this case"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: ignore quota mount options if the quota feature is enabled
ext4 crypto: fix some error handling
ext4: avoid calling dquot_get_next_id() if quota is not enabled
ext4: retry block allocation for failed DIO and DAX writes
ext4: add lockdep annotations for i_data_sem
ext4: allow readdir()'s of large empty directories to be interrupted
btrfs: fix crash/invalid memory access on fsync when using overlayfs
ext4 crypto: use dget_parent() in ext4_d_revalidate()
ext4: use file_dentry()
ext4: use dget_parent() in ext4_file_open()
nfs: use file_dentry()
fs: add file_dentry()
ext4 crypto: don't let data integrity writebacks fail with ENOMEM
ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()

Linus Torvalds
2016-04-08 08:22:20 +0800

05 Apr, 2016

2 commits

ea1754a08 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage ... Browse Code »

Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800
09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

04 Apr, 2016

1 commit

c325a67c7 ext4: ignore quota mount options if the quota feature is enabled ... Browse Code »

Previously, ext4 would fail the mount if the file system had the quota
feature enabled and quota mount options (used for the older quota
setups) were present. This broke xfstests, since xfs silently ignores
the usrquote and grpquota mount options if they are specified. This
commit changes things so that we are consistent with xfs; having the
mount options specified is harmless, so no sense break users by
forbidding them.

Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-04-04 05:03:37 +0800

03 Apr, 2016

1 commit

4762cc3fb ext4 crypto: fix some error handling ... Browse Code »

We should be testing for -ENOMEM but the minus sign is missing.

Fixes: c9af28fdd449 ('ext4 crypto: don't let data integrity writebacks fail with ENOMEM')
Signed-off-by: Dan Carpenter
Signed-off-by: Theodore Ts'o

Dan Carpenter
2016-04-03 06:13:38 +0800

02 Apr, 2016

1 commit

8f0e8746b ext4: avoid calling dquot_get_next_id() if quota is not enabled ... Browse Code »

This should be fixed in the quota layer so we can test with the quota
mutex held, but for now, we need this to avoid tests from crashing the
kernel aborting the regression test suite.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-04-02 00:00:03 +0800

01 Apr, 2016

2 commits

e84dfbe2b ext4: retry block allocation for failed DIO and DAX writes ... Browse Code »

Currently if block allocation for DIO or DAX write fails due to ENOSPC,
we just returned it to userspace. However these ENOSPC errors can be
transient because the transaction freeing blocks has not yet committed.
This demonstrates as failures of generic/102 test when the filesystem is
mounted with 'dax' mount option.

Fix the problem by properly retrying the allocation in case of ENOSPC
error in get blocks functions used for direct IO.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
Tested-by: Ross Zwisler

Jan Kara
2016-04-01 14:07:22 +0800
daf647d2d ext4: add lockdep annotations for i_data_sem ... Browse Code »

With the internal Quota feature, mke2fs creates empty quota inodes and
quota usage tracking is enabled as soon as the file system is mounted.
Since quotacheck is no longer preallocating all of the blocks in the
quota inode that are likely needed to be written to, we are now seeing
a lockdep false positive caused by needing to allocate a quota block
from inside ext4_map_blocks(), while holding i_data_sem for a data
inode. This results in this complaint:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&ei->i_data_sem);
lock(&s->s_dquot.dqio_mutex);
lock(&ei->i_data_sem);
lock(&s->s_dquot.dqio_mutex);

Google-Bug-Id: 27907753

Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org

Theodore Ts'o
2016-04-01 13:31:28 +0800

31 Mar, 2016

1 commit

1028b55ba ext4: allow readdir()'s of large empty directories to be interrupted ... Browse Code »

If a directory has a large number of empty blocks, iterating over all
of them can take a long time, leading to scheduler warnings and users
getting irritated when they can't kill a process in the middle of one
of these long-running readdir operations. Fix this by adding checks to
ext4_readdir() and ext4_htree_fill_tree().

Reported-by: Benjamin LaHaise
Google-Bug-Id: 27880676
Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-03-31 10:36:24 +0800

27 Mar, 2016

4 commits

3d43bcfef ext4 crypto: use dget_parent() in ext4_d_revalidate() ... Browse Code »

This avoids potential problems caused by a race where the inode gets
renamed out from its parent directory and the parent directory is
deleted while ext4_d_revalidate() is running.

Fixes: 28b4c263961c
Reported-by: Al Viro
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org

Theodore Ts'o
2016-03-27 04:15:42 +0800
c0a37d487 ext4: use file_dentry() ... Browse Code »

EXT4 may be used as lower layer of overlayfs and accessing f_path.dentry
can lead to a crash.

Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.

Reported-by: Daniel Axtens
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Fixes: ff978b09f973 ("ext4 crypto: move context consistency check to ext4_file_open()")
Signed-off-by: Miklos Szeredi
Signed-off-by: Theodore Ts'o
Cc: David Howells
Cc: Al Viro
Cc: # v4.5

Miklos Szeredi
2016-03-27 04:14:42 +0800
9dd78d8c9 ext4: use dget_parent() in ext4_file_open() ... Browse Code »

In f_op->open() lock on parent is not held, so there's no guarantee that
parent dentry won't go away at any time.

Even after this patch there's no guarantee that 'dir' will stay the parent
of 'inode', but at least it won't be freed while being used.

Fixes: ff978b09f973 ("ext4 crypto: move context consistency check to ext4_file_open()")
Signed-off-by: Miklos Szeredi
Signed-off-by: Theodore Ts'o
Cc: # v4.5

Miklos Szeredi
2016-03-27 04:14:41 +0800
c9af28fdd ext4 crypto: don't let data integrity writebacks fail with ENOMEM ... Browse Code »

We don't want the writeback triggered from the journal commit (in
data=writeback mode) to cause the journal to abort due to
generic_writepages() returning an ENOMEM error. In addition, if
fsync() fails with ENOMEM, most applications will probably not do the
right thing.

So if we are doing a data integrity sync, and ext4_encrypt() returns
ENOMEM, we will submit any queued I/O to date, and then retry the
allocation using GFP_NOFAIL.

Google-Bug-Id: 27641567

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-03-27 04:14:34 +0800

23 Mar, 2016

2 commits

121cef8f1 ext4: in ext4_dir_llseek, check syscall bitness directly ... Browse Code »

ext4 treats directory offsets differently for 32-bit and 64-bit callers.
Check the caller type using in_compat_syscall, not is_compat_task. This
changes behavior on SPARC slightly.

Signed-off-by: Andy Lutomirski
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Lutomirski
2016-03-23 06:36:02 +0800
9e92f48c3 ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea() ... Browse Code »

We aren't checking to see if the in-inode extended attribute is
corrupted before we try to expand the inode's extra isize fields.

This can lead to potential crashes caused by the BUG_ON() check in
ext4_xattr_shift_entries().

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-03-23 04:13:15 +0800

22 Mar, 2016

2 commits

77d913178 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull UDF and quota updates from Jan Kara:
"This contains a rewrite of UDF handling of filename encoding to fix
remaining overflow issues from Andrew Gabbasov and quota changes to
support new Q_[X]GETNEXTQUOTA quotactl for VFS quota formats"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Fix possible GPF due to uninitialised pointers
ext4: Make Q_GETNEXTQUOTA work for quota in hidden inodes
quota: Forbid Q_GETQUOTA and Q_GETNEXTQUOTA for frozen filesystem
quota: Fix possible races during quota loading
ocfs2: Implement get_next_id()
quota_v2: Implement get_next_id() for V2 quota format
quota: Add support for ->get_nextdqblk() for VFS quota
udf: Merge linux specific translation into CS0 conversion function
udf: Remove struct ustr as non-needed intermediate storage
udf: Use separate buffer for copying split names
udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
udf: Join functions for UTF8 and NLS conversions
udf: Parameterize output length in udf_put_filename
quota: Allow Q_GETQUOTA for frozen filesystem
quota: Fixup comments about return value of Q_[X]GETNEXTQUOTA

Linus Torvalds
2016-03-22 03:22:37 +0800
53d2e6976 Merge tag 'xfs-for-linus-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs ... Browse Code »

Pull xfs updates from Dave Chinner:
"There's quite a lot in this request, and there's some cross-over with
ext4, dax and quota code due to the nature of the changes being made.

As for the rest of the XFS changes, there are lots of little things
all over the place, which add up to a lot of changes in the end.

The major changes are that we've reduced the size of the struct
xfs_inode by ~100 bytes (gives an inode cache footprint reduction of
>10%), the writepage code now only does a single set of mapping tree
lockups so uses less CPU, delayed allocation reservations won't
overrun under random write loads anymore, and we added compile time
verification for on-disk structure sizes so we find out when a commit
or platform/compiler change breaks the on disk structure as early as
possible.

Change summary:

- error propagation for direct IO failures fixes for both XFS and
ext4
- new quota interfaces and XFS implementation for iterating all the
quota IDs in the filesystem
- locking fixes for real-time device extent allocation
- reduction of duplicate information in the xfs and vfs inode, saving
roughly 100 bytes of memory per cached inode.
- buffer flag cleanup
- rework of the writepage code to use the generic write clustering
mechanisms
- several fixes for inode flag based DAX enablement
- rework of remount option parsing
- compile time verification of on-disk format structure sizes
- delayed allocation reservation overrun fixes
- lots of little error handling fixes
- small memory leak fixes
- enable xfsaild freezing again"

* tag 'xfs-for-linus-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (66 commits)
xfs: always set rvalp in xfs_dir2_node_trim_free
xfs: ensure committed is initialized in xfs_trans_roll
xfs: borrow indirect blocks from freed extent when available
xfs: refactor delalloc indlen reservation split into helper
xfs: update freeblocks counter after extent deletion
xfs: debug mode forced buffered write failure
xfs: remove impossible condition
xfs: check sizes of XFS on-disk structures at compile time
xfs: ioends require logically contiguous file offsets
xfs: use named array initializers for log item dumping
xfs: fix computation of inode btree maxlevels
xfs: reinitialise per-AG structures if geometry changes during recovery
xfs: remove xfs_trans_get_block_res
xfs: fix up inode32/64 (re)mount handling
xfs: fix format specifier , should be %llx and not %llu
xfs: sanitize remount options
xfs: convert mount option parsing to tokens
xfs: fix two memory leaks in xfs_attr_list.c error paths
xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE
xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared
...

Linus Torvalds
2016-03-22 02:53:05 +0800

18 Mar, 2016

2 commits

faeb20ecf Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"Performance improvements in SEEK_DATA and xattr scalability
improvements, plus a lot of clean ups and bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (38 commits)
ext4: clean up error handling in the MMP support
jbd2: do not fail journal because of frozen_buffer allocation failure
ext4: use __GFP_NOFAIL in ext4_free_blocks()
ext4: fix compile error while opening the macro DOUBLE_CHECK
ext4: print ext4 mount option data_err=abort correctly
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry()
ext4: fix misspellings in comments.
jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path
ext4: more efficient SEEK_DATA implementation
ext4: cleanup handling of bh->b_state in DAX mmap
ext4: return hole from ext4_map_blocks()
ext4: factor out determining of hole size
ext4: fix setting of referenced bit in ext4_es_lookup_extent()
ext4: remove i_ioend_count
ext4: simplify io_end handling for AIO DIO
ext4: move trans handling and completion deferal out of _ext4_get_block
ext4: rename and split get blocks functions
ext4: use i_mutex to serialize unaligned AIO DIO
ext4: pack ioend structure better
...

Linus Torvalds
2016-03-18 07:31:18 +0800
70477371d Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto update from Herbert Xu:
"Here is the crypto update for 4.6:

API:
- Convert remaining crypto_hash users to shash or ahash, also convert
blkcipher/ablkcipher users to skcipher.
- Remove crypto_hash interface.
- Remove crypto_pcomp interface.
- Add crypto engine for async cipher drivers.
- Add akcipher documentation.
- Add skcipher documentation.

Algorithms:
- Rename crypto/crc32 to avoid name clash with lib/crc32.
- Fix bug in keywrap where we zero the wrong pointer.

Drivers:
- Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
- Add PIC32 hwrng driver.
- Support BCM6368 in bcm63xx hwrng driver.
- Pack structs for 32-bit compat users in qat.
- Use crypto engine in omap-aes.
- Add support for sama5d2x SoCs in atmel-sha.
- Make atmel-sha available again.
- Make sahara hashing available again.
- Make ccp hashing available again.
- Make sha1-mb available again.
- Add support for multiple devices in ccp.
- Improve DMA performance in caam.
- Add hashing support to rockchip"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
crypto: qat - remove redundant arbiter configuration
crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
crypto: qat - Change the definition of icp_qat_uof_regtype
hwrng: exynos - use __maybe_unused to hide pm functions
crypto: ccp - Add abstraction for device-specific calls
crypto: ccp - CCP versioning support
crypto: ccp - Support for multiple CCPs
crypto: ccp - Remove check for x86 family and model
crypto: ccp - memset request context to zero during import
lib/mpi: use "static inline" instead of "extern inline"
lib/mpi: avoid assembler warning
hwrng: bcm63xx - fix non device tree compatibility
crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
crypto: qat - The AE id should be less than the maximal AE number
lib/mpi: Endianness fix
crypto: rockchip - add hash support for crypto engine in rk3288
crypto: xts - fix compile errors
crypto: doc - add skcipher API documentation
crypto: doc - update AEAD AD handling
...

Linus Torvalds
2016-03-18 02:22:54 +0800

14 Mar, 2016

3 commits

030468867 ext4: clean up error handling in the MMP support ... Browse Code »

There is memory leak as both caller function kmmpd() and callee
read_mmp_block() not releasing bh_check (i.e buffer_head).
Given patch fixes this problem.

[ Additional changes suggested by Andreas Dilger -- TYT ]

Signed-off-by: Jadhav Vikram
Signed-off-by: Theodore Ts'o

vikram.jadhav07
2016-03-14 05:56:52 +0800
adb7ef600 ext4: use __GFP_NOFAIL in ext4_free_blocks() ... Browse Code »

This might be unexpected but pages allocated for sbi->s_buddy_cache are
charged to current memory cgroup. So, GFP_NOFS allocation could fail if
current task has been killed by OOM or if current memory cgroup has no
free memory left. Block allocator cannot handle such failures here yet.

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Theodore Ts'o

Konstantin Khlebnikov
2016-03-14 05:29:06 +0800
a2821e34d ext4: fix compile error while opening the macro DOUBLE_CHECK ... Browse Code »

the error is:
fs/ext4/mballoc.c:475:43: error: 'struct ext4_group_info' has
no member named 'bb_bitmap'.
so, the definition of macro DOUBLE_CHECK should before
'struct ext4_group_info', I fixed it, and I moved the macro
AGGRESSIVE_CHECK together, because I think they shoule be together.

Signed-off-by: Aihua Zhang
Signed-off-by: Theodore Ts'o

Aihua Zhang
2016-03-14 05:18:12 +0800

13 Mar, 2016

2 commits

7915a861c ext4: print ext4 mount option data_err=abort correctly ... Browse Code »

If data_err=abort option is specified for an ext3/ext4 mount,
/proc/mounts does show it as "(null)". This is caused by token2str()
returning NULL for Opt_data_err_abort (due to its pattern containing
'=').

Signed-off-by: Ales Novak
Signed-off-by: Theodore Ts'o

Ales Novak
2016-03-13 10:55:50 +0800
5e1021f2b ext4: fix NULL pointer dereference in ext4_mark_inode_dirty() ... Browse Code »

ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on
error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is
ignored in the following "if" condition and ext4_expand_extra_isize()
might be called with NULL iloc.bh set, which triggers NULL pointer
dereference.

This is uncovered by commit 8b4953e13f4c ("ext4: reserve code points for
the project quota feature"), which enlarges the ext4_inode size, and
run the following script on new kernel but with old mke2fs:

#/bin/bash
mnt=/mnt/ext4
devname=ext4-error
dev=/dev/mapper/$devname
fsimg=/home/fs.img

trap cleanup 0 1 2 3 9 15

cleanup()
{
umount $mnt >/dev/null 2>&1
dmsetup remove $devname
losetup -d $backend_dev
rm -f $fsimg
exit 0
}

rm -f $fsimg
fallocate -l 1g $fsimg
backend_dev=`losetup -f --show $fsimg`
devsize=`blockdev --getsz $backend_dev`

good_tab="0 $devsize linear $backend_dev 0"
error_tab="0 $devsize error $backend_dev 0"

dmsetup create $devname --table "$good_tab"

mkfs -t ext4 $dev
mount -t ext4 -o errors=continue,strictatime $dev $mnt

dmsetup load $devname --table "$error_tab" && dmsetup resume $devname
echo 3 > /proc/sys/vm/drop_caches
ls -l $mnt
exit 0

[ Patch changed to simplify the function a tiny bit. -- Ted ]

Signed-off-by: Eryu Guan
Signed-off-by: Theodore Ts'o

Eryu Guan
2016-03-13 10:40:32 +0800

10 Mar, 2016

9 commits

a8ed9b869 ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry() ... Browse Code »

BUFFER_TRACE info "call ext4_handle_dirty_metadata" doesn't match the
code, so drop it.

Signed-off-by: Geliang Tang
Signed-off-by: Theodore Ts'o

Geliang Tang
2016-03-10 13:18:57 +0800
b8a07463c ext4: fix misspellings in comments. ... Browse Code »

Signed-off-by: Adam Buchbinder
Signed-off-by: Theodore Ts'o

Adam Buchbinder
2016-03-10 12:49:05 +0800
2d90c160e ext4: more efficient SEEK_DATA implementation ... Browse Code »

Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as
ext4_seek_data() iterates hole block-by-block. Fix the problem by using
returned hole size from ext4_map_blocks() and thus skip the hole in one
go.

Update also SEEK_HOLE implementation to follow the same pattern as
SEEK_DATA to make future maintenance easier.

Furthermore we add cond_resched() to both ext4_seek_data() and
ext4_seek_hole() to avoid softlockups in case evil user creates huge
fragmented file and we have to go through lots of extents.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-03-10 12:11:13 +0800
e3fb8eb14 ext4: cleanup handling of bh->b_state in DAX mmap ... Browse Code »

ext4_dax_mmap_get_block() updates bh->b_state directly instead of using
ext4_update_bh_state(). This is mostly a cosmetic issue since DAX code
always passes on-stack buffer_head but clean this up to make code more
uniform.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-03-10 12:03:27 +0800
facab4d97 ext4: return hole from ext4_map_blocks() ... Browse Code »

Currently, ext4_map_blocks() just returns 0 when it finds a hole and
allocation is not requested. However we have all the information
available to tell how large the hole actually is and there are callers
of ext4_map_blocks() which would save some block-by-block hole iteration
if they knew this information. So fill in struct ext4_map_blocks even
for holes with the information we have. We keep returning 0 for holes to
maintain backward compatibility of the function.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-03-10 11:54:00 +0800
140a52508 ext4: factor out determining of hole size ... Browse Code »

ext4_ext_put_gap_in_cache() determines hole size in the extent tree,
then trims this with possible delayed allocated blocks, and inserts the
result into the extent status tree. Factor out determination of the size
of the hole in the extent tree as we will need this information in
ext4_ext_map_blocks() as well.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-03-10 11:46:57 +0800
718e47a57 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 fix from Ted Ts'o:
"This fixes a regression which crept in v4.5-rc5"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: iterate over buffer heads correctly in move_extent_per_page()

Linus Torvalds
2016-03-10 11:33:05 +0800
87d8a74b5 ext4: fix setting of referenced bit in ext4_es_lookup_extent() ... Browse Code »

We were setting referenced bit on the extent structure we return from
ext4_es_lookup_extent() which is just a private structure on stack. Thus
setting had no effect. Set the bit in the structure in the status tree
instead.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-03-10 11:26:55 +0800
6ffe77bad ext4: iterate over buffer heads correctly in move_extent_per_page() ... Browse Code »

In commit bcff24887d00 ("ext4: don't read blocks from disk after extents
being swapped") bh is not updated correctly in the for loop and wrong
data has been written to disk. generic/324 catches this on sub-page
block size ext4.

Fixes: bcff24887d00 ("ext4: don't read blocks from disk after extentsbeing swapped")
Signed-off-by: Eryu Guan
Signed-off-by: Theodore Ts'o

Eryu Guan
2016-03-10 10:37:53 +0800

09 Mar, 2016

5 commits

600be30a8 ext4: remove i_ioend_count ... Browse Code »

Remove counter of pending io ends as it is unused.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-03-09 12:39:21 +0800
109811c20 ext4: simplify io_end handling for AIO DIO ... Browse Code »

When mapping blocks for direct IO, we allocate io_end structure before
mapping blocks and store pointer to it in the inode. This creates a
requirement that any AIO DIO using io_end must be protected by i_mutex.
This created problems in the past with dioread_nolock mode which was
corrupting io_end pointers. Also io_end is allocated unnecessarily in
case where we don't need to convert any extents (which is a common case
for example when overwriting file).

We fix the problem by allocating io_end only once we return unwritten
extent from block mapping function for AIO DIO (so we can save some
pointless io_end allocations) and we pass pointer to it in bh->b_private
which generic DIO code later passes to our end IO callback. That way we
remove any need for global pointer to io_end structure and thus fix the
races.

The downside of this change is that the checking for unwritten IO in
flight in ext4_extents_can_be_merged() is more racy since we now
increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping
i_data_sem. However the check has been racy already before because
ext4_writepages() already increment i_unwritten after dropping
i_data_sem and reserved blocks save us from hitting ENOSPC in the worst
case.

Signed-off-by: Jan Kara

Jan Kara
2016-03-09 12:36:46 +0800
efe70c295 ext4: move trans handling and completion deferal out of _ext4_get_block ... Browse Code »

There is no need to handle starting of a transaction and deferal of DIO
completion in _ext4_get_block() function. We can move this out to get
block functions for direct IO that need it. That way we can add stricter
checks verifying things work as we expect.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-03-09 12:35:46 +0800
705965bd6 ext4: rename and split get blocks functions ... Browse Code »

Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better
describe what it does. Also split out get blocks functions for direct
IO. Later we move functionality from _ext4_get_blocks() there. There's no
functional change in this patch.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-03-09 12:08:10 +0800
e142d0526 ext4: use i_mutex to serialize unaligned AIO DIO ... Browse Code »

Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
However the code cleanups that happened after 2011 when the lock was
introduced made aio_mutex acquired at almost the same places where we
already have exclusion using i_mutex. So just use i_mutex for the
exclusion of unaligned AIO DIO.

The change moves waiting for pending unwritten extent conversion under
i_mutex. That makes special handling of O_APPEND writes unnecessary and
also avoids possible livelocking of unaligned AIO DIO with aligned one
(nothing was preventing contiguous stream of aligned AIO DIOs to let
unaligned AIO DIO wait forever).

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-03-09 11:44:50 +0800