Eric Lee / smarc-fsl-linux-kernel

03 Aug, 2018

2 commits

f547aa20b ext4: fix check to prevent initializing reserved inodes ... Browse Code »

commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream.

Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is
valid" will complain if block group zero does not have the
EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct,
since a freshly created file system has this flag cleared. It gets
almost immediately after the file system is mounted read-write --- but
the following somewhat unlikely sequence will end up triggering a
false positive report of a corrupted file system:

mkfs.ext4 /dev/vdc
mount -o ro /dev/vdc /vdc
mount -o remount,rw /dev/vdc

Instead, when initializing the inode table for block group zero, test
to make sure that itable_unused count is not too large, since that is
the case that will result in some or all of the reserved inodes
getting cleared.

This fixes the failures reported by Eric Whiteney when running
generic/230 and generic/231 in the the nojournal test case.

Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid")
Reported-by: Eric Whitney
Signed-off-by: Theodore Ts'o
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2018-08-03 13:50:43 +0800
dc1b4b710 ext4: check for allocation block validity with block group locked ... Browse Code »

commit 8d5a803c6a6ce4ec258e31f76059ea5153ba46ef upstream.

With commit 044e6e3d74a3: "ext4: don't update checksum of new
initialized bitmaps" the buffer valid bit will get set without
actually setting up the checksum for the allocation bitmap, since the
checksum will get calculated once we actually allocate an inode or
block.

If we are doing this, then we need to (re-)check the verified bit
after we take the block group lock. Otherwise, we could race with
another process reading and verifying the bitmap, which would then
complain about the checksum being invalid.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137

Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2018-08-03 13:50:43 +0800

11 Jul, 2018

1 commit

44a4bc970 ext4: only look at the bg_flags field if it is valid ... Browse Code »

commit 8844618d8aa7a9973e7b527d038a2a589665002c upstream.

The bg_flags field in the block group descripts is only valid if the
uninit_bg or metadata_csum feature is enabled. We were not
consistently looking at this field; fix this.

Also block group #0 must never have uninitialized allocation bitmaps,
or need to be zeroed, since that's where the root inode, and other
special inodes are set up. Check for these conditions and mark the
file system as corrupted if they are detected.

This addresses CVE-2018-10876.

https://bugzilla.kernel.org/show_bug.cgi?id=199403

Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2018-07-11 22:29:17 +0800

02 May, 2018

1 commit

b39430ea0 ext4: add validity checks for bitmap block numbers ... Browse Code »

commit 7dac4a1726a9c64a517d595c40e95e2d0d135f6f upstream.

An privileged attacker can cause a crash by mounting a crafted ext4
image which triggers a out-of-bounds read in the function
ext4_valid_block_bitmap() in fs/ext4/balloc.c.

This issue has been assigned CVE-2018-1093.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199181
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1560782
Reported-by: Wen Xu
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2018-05-02 03:58:07 +0800

24 Apr, 2018

1 commit

bd499f553 ext4: don't update checksum of new initialized bitmaps ... Browse Code »

commit 044e6e3d74a3d7103a0c8a9305dfd94d64000660 upstream.

When reading the inode or block allocation bitmap, if the bitmap needs
to be initialized, do not update the checksum in the block group
descriptor. That's because we're not set up to journal those changes.
Instead, just set the verified bit on the bitmap block, so that it's
not necessary to validate the checksum.

When a block or inode allocation actually happens, at that point the
checksum will be calculated, and update of the bg descriptor block
will be properly journalled.

Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2018-04-24 15:36:30 +0800

20 Dec, 2017

1 commit

0228af23d ext4: add missing error check in __ext4_new_inode() ... Browse Code »

commit 996fc4477a0ea28226b30d175f053fb6f9a4fa36 upstream.

It's possible for ext4_get_acl() to return an ERR_PTR. So we need to
add a check for this case in __ext4_new_inode(). Otherwise on an
error we can end up oops the kernel.

This was getting triggered by xfstests generic/388, which is a test
which exercises the shutdown code path.

Signed-off-by: Theodore Ts'o
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2017-12-20 17:10:22 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

15 Sep, 2017

1 commit

0f0d12728 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull mount flag updates from Al Viro:
"Another chunk of fmount preparations from dhowells; only trivial
conflicts for that part. It separates MS_... bits (very grotty
mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
only a small subset of MS_... stuff).

This does *not* convert the filesystems to new constants; only the
infrastructure is done here. The next step in that series is where the
conflicts would be; that's the conversion of filesystems. It's purely
mechanical and it's better done after the merge, so if you could run
something like

list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

sed -i -e 's/\/SB_RDONLY/g' \
-e 's/\/SB_NOSUID/g' \
-e 's/\/SB_NODEV/g' \
-e 's/\/SB_NOEXEC/g' \
-e 's/\/SB_SYNCHRONOUS/g' \
-e 's/\/SB_MANDLOCK/g' \
-e 's/\/SB_DIRSYNC/g' \
-e 's/\/SB_NOATIME/g' \
-e 's/\/SB_NODIRATIME/g' \
-e 's/\/SB_SILENT/g' \
-e 's/\/SB_POSIXACL/g' \
-e 's/\/SB_KERNMOUNT/g' \
-e 's/\/SB_I_VERSION/g' \
-e 's/\/SB_LAZYTIME/g' \
$list

and commit it with something along the lines of 'convert filesystems
away from use of MS_... constants' as commit message, it would save a
quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Differentiate mount flags (MS_*) from internal superblock flags
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

Linus Torvalds
2017-09-15 09:54:01 +0800

31 Aug, 2017

1 commit

b5f515735 ext4: avoid Y2038 overflow in recently_deleted() ... Browse Code »

Avoid a 32-bit time overflow in recently_deleted() since i_dtime
(inode deletion time) is stored only as a 32-bit value on disk.
Since i_dtime isn't used for much beyond a boolean value in e2fsck
and is otherwise only used in this function in the kernel, there is
no benefit to use more space in the inode for this field on disk.

Instead, compare only the relative deletion time with the low
32 bits of the time using the newly-added time_before32() helper,
which is similar to time_before() and time_after() for jiffies.

Increase RECENTCY_DIRTY to 300s based on Ted's comments about
usage experience at Google.

Signed-off-by: Andreas Dilger
Signed-off-by: Theodore Ts'o
Reviewed-by: Arnd Bergmann

Andreas Dilger
2017-08-31 23:09:45 +0800

25 Aug, 2017

1 commit

901ed070d ext4: reduce lock contention in __ext4_new_inode ... Browse Code »

While running number of creating file threads concurrently,
we found heavy lock contention on group spinlock:

FUNC TOTAL_TIME(us) COUNT AVG(us)
ext4_create 1707443399 1440000 1185.72
_raw_spin_lock 1317641501 180899929 7.28
jbd2__journal_start 287821030 1453950 197.96
jbd2_journal_get_write_access 33441470 73077185 0.46
ext4_add_nondir 29435963 1440000 20.44
ext4_add_entry 26015166 1440049 18.07
ext4_dx_add_entry 25729337 1432814 17.96
ext4_mark_inode_dirty 12302433 5774407 2.13

most of cpu time blames to _raw_spin_lock, here is some testing
numbers with/without patch.

Test environment:
Server : SuperMicro Sever (2 x E5-2690 v3@2.60GHz, 128GB 2133MHz
DDR4 Memory, 8GbFC)
Storage : 2 x RAID1 (DDN SFA7700X, 4 x Toshiba PX02SMU020 200GB
Read Intensive SSD)

format command:
mkfs.ext4 -J size=4096

test command:
mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
-r -i 1 -v -p 10 -u #first run to load inode

mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
-r -i 3 -v -p 10 -u

Kernel version: 4.13.0-rc3

Test 1,440,000 files with 48 directories by 48 processes:

Without patch:

File Creation File removal
79,033 289,569 ops/per second
81,463 285,359
79,875 288,475

With patch:
File Creation File removal
810669 301694
812805 302711
813965 297670

Creation performance is improved more than 10X with large
journal size. The main problem here is we test bitmap
and do some check and journal operations which could be
slept, then we test and set with lock hold, this could
be racy, and make 'inode' steal by other process.

However, after first try, we could confirm handle has
been started and inode bitmap journaled too, then
we could find and set bit with lock hold directly, this
will mostly gurateee success with second try.

Tested-by: Shuichi Ihara
Signed-off-by: Wang Shilong
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Wang Shilong
2017-08-25 00:56:35 +0800

24 Aug, 2017

2 commits

2fe435d8b ext4: cleanup goto next group ... Browse Code »

avoid duplicated codes, also we need goto
next group in case we found reserved inode.

Signed-off-by: Wang Shilong
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Wang Shilong
2017-08-24 23:58:18 +0800
4f9d956d1 ext4: do not unnecessarily allocate buffer in recently_deleted() ... Browse Code »

In recently_deleted() function we want to check whether inode is still
cached in buffer cache. Use sb_find_get_block() for that instead of
sb_getblk() to avoid unnecessary allocation of bdev page and buffer
heads.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2017-08-24 23:52:21 +0800

17 Jul, 2017

1 commit

bc98a42c1 VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) ... Browse Code »

Firstly by applying the following with coccinelle's spatch:

@@ expression SB; @@
-SB->s_flags & MS_RDONLY
+sb_rdonly(SB)

to effect the conversion to sb_rdonly(sb), then by applying:

@@ expression A, SB; @@
(
-(!sb_rdonly(SB)) && A
+!sb_rdonly(SB) && A
|
-A != (sb_rdonly(SB))
+A != sb_rdonly(SB)
|
-A == (sb_rdonly(SB))
+A == sb_rdonly(SB)
|
-!(sb_rdonly(SB))
+!sb_rdonly(SB)
|
-A && (sb_rdonly(SB))
+A && sb_rdonly(SB)
|
-A || (sb_rdonly(SB))
+A || sb_rdonly(SB)
|
-(sb_rdonly(SB)) != A
+sb_rdonly(SB) != A
|
-(sb_rdonly(SB)) == A
+sb_rdonly(SB) == A
|
-(sb_rdonly(SB)) && A
+sb_rdonly(SB) && A
|
-(sb_rdonly(SB)) || A
+sb_rdonly(SB) || A
)

@@ expression A, B, SB; @@
(
-(sb_rdonly(SB)) ? 1 : 0
+sb_rdonly(SB)
|
-(sb_rdonly(SB)) ? A : B
+sb_rdonly(SB) ? A : B
)

to remove left over excess bracketage and finally by applying:

@@ expression A, SB; @@
(
-(A & MS_RDONLY) != sb_rdonly(SB)
+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
|
-(A & MS_RDONLY) == sb_rdonly(SB)
+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
)

to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.

Signed-off-by: David Howells

David Howells
2017-07-17 15:45:34 +0800

06 Jul, 2017

2 commits

af65207c7 ext4: fix __ext4_new_inode() journal credits calculation ... Browse Code »

ea_inode feature allows creating extended attributes that are up to
64k in size. Update __ext4_new_inode() to pick increased credit limits.

To avoid overallocating too many journal credits, update
__ext4_xattr_set_credits() to make a distinction between xattr create
vs update. This helps __ext4_new_inode() because all attributes are
known to be new, so we can save credits that are normally needed to
delete old values.

Also, have fscrypt specify its maximum context size so that we don't
end up allocating credits for 64k size.

Signed-off-by: Tahsin Erdogan
Signed-off-by: Theodore Ts'o

Tahsin Erdogan
2017-07-06 12:01:59 +0800
ad47f9533 ext4: skip ext4_init_security() and encryption on ea_inodes ... Browse Code »

Extended attribute inodes are internal to ext4. Adding encryption/security
related attributes on them would mean dealing with nested calls into ea code.
Since they have no direct exposure to user mode, just avoid creating ea
entries for them.

Signed-off-by: Tahsin Erdogan
Signed-off-by: Theodore Ts'o

Tahsin Erdogan
2017-07-06 12:00:59 +0800

22 Jun, 2017

2 commits

1b917ed8a ext4: do not set posix acls on xattr inodes ... Browse Code »

We don't need acls on xattr inodes because they are not directly
accessible from user mode.

Besides lockdep complains about recursive locking of xattr_sem as seen
below.

=============================================
[ INFO: possible recursive locking detected ]
4.11.0-rc8+ #402 Not tainted
---------------------------------------------
python/1894 is trying to acquire lock:
(&ei->xattr_sem){++++..}, at: [] ext4_xattr_get+0x66/0x270

but task is already holding lock:
(&ei->xattr_sem){++++..}, at: [] ext4_xattr_set_handle+0xa0/0x5d0

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&ei->xattr_sem);
lock(&ei->xattr_sem);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by python/1894:
#0: (sb_writers#10){.+.+.+}, at: [] mnt_want_write+0x1f/0x50
#1: (&sb->s_type->i_mutex_key#15){+.+...}, at: [] vfs_setxattr+0x57/0xb0
#2: (&ei->xattr_sem){++++..}, at: [] ext4_xattr_set_handle+0xa0/0x5d0

stack backtrace:
CPU: 0 PID: 1894 Comm: python Not tainted 4.11.0-rc8+ #402
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x67/0x99
__lock_acquire+0x5f3/0x1830
lock_acquire+0xb5/0x1d0
down_read+0x2f/0x60
ext4_xattr_get+0x66/0x270
ext4_get_acl+0x43/0x1e0
get_acl+0x72/0xf0
posix_acl_create+0x5e/0x170
ext4_init_acl+0x21/0xc0
__ext4_new_inode+0xffd/0x16b0
ext4_xattr_set_entry+0x5ea/0xb70
ext4_xattr_block_set+0x1b5/0x970
ext4_xattr_set_handle+0x351/0x5d0
ext4_xattr_set+0x124/0x180
ext4_xattr_user_set+0x34/0x40
__vfs_setxattr+0x66/0x80
__vfs_setxattr_noperm+0x69/0x1c0
vfs_setxattr+0xa2/0xb0
setxattr+0x129/0x160
path_setxattr+0x87/0xb0
SyS_setxattr+0xf/0x20
entry_SYSCALL_64_fastpath+0x18/0xad

Signed-off-by: Tahsin Erdogan
Signed-off-by: Theodore Ts'o

Tahsin Erdogan
2017-06-22 09:21:39 +0800
e50e5129f ext4: xattr-in-inode support ... Browse Code »

Large xattr support is implemented for EXT4_FEATURE_INCOMPAT_EA_INODE.

If the size of an xattr value is larger than will fit in a single
external block, then the xattr value will be saved into the body
of an external xattr inode.

The also helps support a larger number of xattr, since only the headers
will be stored in the in-inode space or the single external block.

The inode is referenced from the xattr header via "e_value_inum",
which was formerly "e_value_block", but that field was never used.
The e_value_size still contains the xattr size so that listing
xattrs does not need to look up the inode if the data is not accessed.

struct ext4_xattr_entry {
__u8 e_name_len; /* length of name */
__u8 e_name_index; /* attribute name index */
__le16 e_value_offs; /* offset in disk block of value */
__le32 e_value_inum; /* inode in which value is stored */
__le32 e_value_size; /* size of attribute value */
__le32 e_hash; /* hash value of name and value */
char e_name[0]; /* attribute name */
};

The xattr inode is marked with the EXT4_EA_INODE_FL flag and also
holds a back-reference to the owning inode in its i_mtime field,
allowing the ext4/e2fsck to verify the correct inode is accessed.

[ Applied fix by Dan Carpenter to avoid freeing an ERR_PTR. ]

Lustre-Jira: https://jira.hpdd.intel.com/browse/LU-80
Lustre-bugzilla: https://bugzilla.lustre.org/show_bug.cgi?id=4424
Signed-off-by: Kalpak Shah
Signed-off-by: James Simmons
Signed-off-by: Andreas Dilger
Signed-off-by: Tahsin Erdogan
Signed-off-by: Theodore Ts'o
Signed-off-by: Dan Carpenter

Andreas Dilger
2017-06-22 09:10:32 +0800

02 May, 2017

1 commit

aa1dca3bd ext4: inherit encryption xattr before other xattrs ... Browse Code »

When using both encryption and SELinux (or another feature that requires
an xattr per file) on a filesystem with 256-byte inodes, each file's
xattrs usually spill into an external xattr block. Currently, the
xattrs are inherited in the order ACL, security, then encryption.
Therefore, if spillage occurs, the encryption xattr will always end up
in the external block. This is not ideal because the encryption xattrs
contain a nonce, so they will always be unique and will prevent the
external xattr blocks from being deduplicated.

To improve the situation, change the inheritance order to encryption,
ACL, then security. This gives the encryption xattr a better chance to
be stored in-inode, allowing the other xattr(s) to be deduplicated.

Note that it may be better for userspace to format the filesystem with
512-byte inodes in this case. However, it's not the default.

Signed-off-by: Eric Biggers
Signed-off-by: Theodore Ts'o

Eric Biggers
2017-05-02 12:49:54 +0800

02 Mar, 2017

1 commit

5b825c3af sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> ... Browse Code »

Add #include dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because is included in over
2,200 files ...

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:31 +0800

05 Feb, 2017

1 commit

0db1ff222 ext4: add shutdown bit and check for it ... Browse Code »

Add a shutdown bit that will cause ext4 processing to fail immediately
with EIO.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2017-02-05 14:28:48 +0800

01 Jan, 2017

1 commit

54475f531 fscrypt: use ENOKEY when file cannot be created w/o key ... Browse Code »

As part of an effort to clean up fscrypt-related error codes, make
attempting to create a file in an encrypted directory that hasn't been
"unlocked" fail with ENOKEY. Previously, several error codes were used
for this case, including ENOENT, EACCES, and EPERM, and they were not
consistent between and within filesystems. ENOKEY is a better choice
because it expresses that the failure is due to lacking the encryption
key. It also matches the error code returned when trying to open an
encrypted regular file without the key.

I am not aware of any users who might be relying on the previous
inconsistent error codes, which were never documented anywhere.

This failure case will be exercised by an xfstest.

Signed-off-by: Eric Biggers
Signed-off-by: Theodore Ts'o

Eric Biggers
2017-01-01 05:26:20 +0800

22 Nov, 2016

1 commit

2f8f5e76c ext4: avoid lockdep warning when inheriting encryption context ... Browse Code »

On a lockdep-enabled kernel, xfstests generic/027 fails due to a lockdep
warning when run on ext4 mounted with -o test_dummy_encryption:

xfs_io/4594 is trying to acquire lock:
(jbd2_handle
){++++.+}, at:
[] jbd2_log_wait_commit+0x5/0x11b

but task is already holding lock:
(jbd2_handle
){++++.+}, at:
[] start_this_handle+0x354/0x3d8

The abbreviated call stack is:

[] ? jbd2_log_wait_commit+0x5/0x11b
[] jbd2_log_wait_commit+0x40/0x11b
[] ? jbd2_log_wait_commit+0x5/0x11b
[] ? __jbd2_journal_force_commit+0x76/0xa6
[] __jbd2_journal_force_commit+0x91/0xa6
[] jbd2_journal_force_commit_nested+0xe/0x18
[] ext4_should_retry_alloc+0x72/0x79
[] ext4_xattr_set+0xef/0x11f
[] ext4_set_context+0x3a/0x16b
[] fscrypt_inherit_context+0xe3/0x103
[] __ext4_new_inode+0x12dc/0x153a
[] ext4_create+0xb7/0x161

When a file is created in an encrypted directory, ext4_set_context() is
called to set an encryption context on the new file. This calls
ext4_xattr_set(), which contains a retry loop where the journal is
forced to commit if an ENOSPC error is encountered.

If the task actually were to wait for the journal to commit in this
case, then it would deadlock because a handle remains open from
__ext4_new_inode(), so the running transaction can't be committed yet.
Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not
allowing the running transaction to be committed while the current task
has it open. However, the above lockdep warning is still triggered.

This was a false positive which was introduced by: 1eaa566d368b: jbd2:
track more dependencies on transaction commit

Fix the problem by passing the handle through the 'fs_data' argument to
ext4_set_context(), then using ext4_xattr_set_handle() instead of
ext4_xattr_set(). And in the case where no journal handle is specified
and ext4_set_context() has to open one, add an ENOSPC retry loop since
in that case it is the outermost transaction.

Signed-off-by: Eric Biggers

Eric Biggers
2016-11-22 00:52:44 +0800

15 Nov, 2016

1 commit

eeca7ea1b ext4: use current_time() for inode timestamps ... Browse Code »

CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe.
current_time() will be transitioned to be y2038 safe
along with vfs.

current_time() returns timestamps according to the
granularities set in the super_block.
The granularity check in ext4_current_time() to call
current_time() or CURRENT_TIME_SEC is not required.
Use current_time() directly to obtain timestamps
unconditionally, and remove ext4_current_time().

Quota files are assumed to be on the same filesystem.
Hence, use current_time() for these files as well.

Signed-off-by: Deepa Dinamani
Signed-off-by: Theodore Ts'o
Reviewed-by: Arnd Bergmann

Deepa Dinamani
2016-11-15 10:40:10 +0800

06 Sep, 2016

1 commit

0b7b77791 ext4: remove old feature helpers ... Browse Code »

Use the ext4_{has,set,clear}_feature_* helpers to replace the old
feature helpers.

Signed-off-by: Kaho Ng
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara
Reviewed-by: Darrick J. Wong

Kaho Ng
2016-09-06 11:11:58 +0800

27 Jul, 2016

1 commit

396d10993 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"The major change this cycle is deleting ext4's copy of the file system
encryption code and switching things over to using the copies in
fs/crypto. I've updated the MAINTAINERS file to add an entry for
fs/crypto listing Jaeguk Kim and myself as the maintainers.

There are also a number of bug fixes, most notably for some problems
found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also
fixed is a writeback deadlock detected by generic/130, and some
potential races in the metadata checksum code"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
ext4: verify extent header depth
ext4: short-cut orphan cleanup on error
ext4: fix reference counting bug on block allocation error
MAINTAINRES: fs-crypto maintainers update
ext4 crypto: migrate into vfs's crypto engine
ext2: fix filesystem deadlock while reading corrupted xattr block
ext4: fix project quota accounting without quota limits enabled
ext4: validate s_reserved_gdt_blocks on mount
ext4: remove unused page_idx
ext4: don't call ext4_should_journal_data() on the journal inode
ext4: Fix WARN_ON_ONCE in ext4_commit_super()
ext4: fix deadlock during page writeback
ext4: correct error value of function verifying dx checksum
ext4: avoid modifying checksum fields directly during checksum verification
ext4: check for extents that wrap around
jbd2: make journal y2038 safe
jbd2: track more dependencies on transaction commit
jbd2: move lockdep tracking to journal_s
jbd2: move lockdep instrumentation for jbd2 handles
ext4: respect the nobarrier mount option in nojournal mode
...

Linus Torvalds
2016-07-27 09:35:55 +0800

11 Jul, 2016

1 commit

a7550b30a ext4 crypto: migrate into vfs's crypto engine ... Browse Code »

This patch removes the most parts of internal crypto codes.
And then, it modifies and adds some ext4-specific crypt codes to use the generic
facility.

Signed-off-by: Jaegeuk Kim
Signed-off-by: Theodore Ts'o

Jaegeuk Kim
2016-07-11 02:01:03 +0800

08 Jun, 2016

1 commit

2a222ca99 fs: have submit_bh users pass in op and flags separately ... Browse Code »

This has submit_bh users pass in the operation and flags separately,
so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that
is submitted.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800

30 Apr, 2016

2 commits

7827a7f6e ext4: clean up error handling when orphan list is corrupted ... Browse Code »

Instead of just printing warning messages, if the orphan list is
corrupted, declare the file system is corrupted. If there are any
reserved inodes in the orphaned inode list, declare the file system
corrupted and stop right away to avoid doing more potential damage to
the file system.

Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-04-30 12:49:54 +0800
c9eb13a91 ext4: fix hang when processing corrupted orphaned inode list ... Browse Code »

If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly). Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.

This can be reproduced via:

mke2fs -t ext4 /tmp/foo.img 100
debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
mount -o loop /tmp/foo.img /mnt

(But don't do this if you are using an unpatched kernel if you care
about the system staying functional. :-)

This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1]. (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)

[1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf

Cc: stable@vger.kernel.org
Reported by: Vegard Nossum
Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-04-30 12:48:54 +0800

10 Mar, 2016

1 commit

b8a07463c ext4: fix misspellings in comments. ... Browse Code »

Signed-off-by: Adam Buchbinder
Signed-off-by: Theodore Ts'o

Adam Buchbinder
2016-03-10 12:49:05 +0800

12 Feb, 2016

1 commit

05145bd79 ext4: fix scheduling in atomic on group checksum failure ... Browse Code »

When block group checksum is wrong, we call ext4_error() while holding
group spinlock from ext4_init_block_bitmap() or
ext4_init_inode_bitmap() which results in scheduling while in atomic.
Fix the issue by calling ext4_error() later after dropping the spinlock.

CC: stable@vger.kernel.org
Reported-by: Dmitry Vyukov
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
Reviewed-by: Darrick J. Wong

Jan Kara
2016-02-12 12:15:12 +0800

09 Jan, 2016

1 commit

040cb3786 ext4: adds project ID support ... Browse Code »

Signed-off-by: Li Xi
Signed-off-by: Theodore Ts'o
Reviewed-by: Andreas Dilger
Reviewed-by: Jan Kara

Li Xi
2016-01-09 05:01:21 +0800

18 Oct, 2015

3 commits

9008a58e5 ext4: make the bitmap read routines return real error codes ... Browse Code »

Make the bitmap reaading routines return real error codes (EIO,
EFSCORRUPTED, EFSBADCRC) which can then be reflected back to
userspace for more precise diagnosis work.

In particular, this means that mballoc no longer claims that we're out
of memory if the block bitmaps become corrupt.

Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o

Darrick J. Wong
2015-10-18 09:33:24 +0800
e2b911c53 ext4: clean up feature test macros with predicate functions ... Browse Code »

Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros. Furthermore, clean out the
places where we open-coded feature tests.

Signed-off-by: Darrick J. Wong

Darrick J. Wong
2015-10-18 04:18:43 +0800
6a797d273 ext4: call out CRC and corruption errors with specific error codes ... Browse Code »

Instead of overloading EIO for CRC errors and corrupt structures,
return the same error codes that XFS returns for the same issues.

Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o

Darrick J. Wong
2015-10-18 04:16:04 +0800

24 Jul, 2015

1 commit

a7cdadee0 ext4: Handle error from dquot_initialize() ... Browse Code »

dquot_initialize() can now return error. Handle it where possible.

Acked-by: Theodore Ts'o
Signed-off-by: Jan Kara

Jan Kara
2015-07-24 02:59:37 +0800

01 Jun, 2015

1 commit

e709e9df6 ext4 crypto: encrypt tmpfile located in encryption protected directory ... Browse Code »

Factor out calls to ext4_inherit_context() and move them to
__ext4_new_inode(); this fixes a problem where ext4_tmpfile() wasn't
calling calling ext4_inherit_context(), so the temporary file wasn't
getting protected. Since the blocks for the tmpfile could end up on
disk, they really should be protected if the tmpfile is created within
the context of an encrypted directory.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2015-06-01 01:35:02 +0800

19 May, 2015

1 commit

f5aed2c2a ext4: clean up superblock encryption mode fields ... Browse Code »

The superblock fields s_file_encryption_mode and s_dir_encryption_mode
are vestigal, so remove them as a cleanup. While we're at it, allow
file systems with both encryption and inline_data enabled at the same
time to work correctly. We can't have encrypted inodes with inline
data, but there's no reason to prohibit unencrypted inodes from using
the inline data feature.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2015-05-19 01:18:47 +0800

27 Apr, 2015

1 commit

9ec3a646f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only

Linus Torvalds
2015-04-27 08:22:07 +0800

16 Apr, 2015

1 commit

6ddb24478 ext4 crypto: enable encryption feature flag ... Browse Code »

Also add the test dummy encryption mode flag so we can more easily
test the encryption patches using xfstests.

Signed-off-by: Michael Halcrow
Signed-off-by: Theodore Ts'o

Theodore Ts'o
2015-04-16 13:56:00 +0800