06 Sep, 2016
1 commit
-
Use the ext4_{has,set,clear}_feature_* helpers to replace the old
feature helpers.Signed-off-by: Kaho Ng
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara
Reviewed-by: Darrick J. Wong
27 Jul, 2016
1 commit
-
Pull ext4 updates from Ted Ts'o:
"The major change this cycle is deleting ext4's copy of the file system
encryption code and switching things over to using the copies in
fs/crypto. I've updated the MAINTAINERS file to add an entry for
fs/crypto listing Jaeguk Kim and myself as the maintainers.There are also a number of bug fixes, most notably for some problems
found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also
fixed is a writeback deadlock detected by generic/130, and some
potential races in the metadata checksum code"* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
ext4: verify extent header depth
ext4: short-cut orphan cleanup on error
ext4: fix reference counting bug on block allocation error
MAINTAINRES: fs-crypto maintainers update
ext4 crypto: migrate into vfs's crypto engine
ext2: fix filesystem deadlock while reading corrupted xattr block
ext4: fix project quota accounting without quota limits enabled
ext4: validate s_reserved_gdt_blocks on mount
ext4: remove unused page_idx
ext4: don't call ext4_should_journal_data() on the journal inode
ext4: Fix WARN_ON_ONCE in ext4_commit_super()
ext4: fix deadlock during page writeback
ext4: correct error value of function verifying dx checksum
ext4: avoid modifying checksum fields directly during checksum verification
ext4: check for extents that wrap around
jbd2: make journal y2038 safe
jbd2: track more dependencies on transaction commit
jbd2: move lockdep tracking to journal_s
jbd2: move lockdep instrumentation for jbd2 handles
ext4: respect the nobarrier mount option in nojournal mode
...
11 Jul, 2016
1 commit
-
This patch removes the most parts of internal crypto codes.
And then, it modifies and adds some ext4-specific crypt codes to use the generic
facility.Signed-off-by: Jaegeuk Kim
Signed-off-by: Theodore Ts'o
08 Jun, 2016
1 commit
-
This has submit_bh users pass in the operation and flags separately,
so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that
is submitted.Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
30 Apr, 2016
2 commits
-
Instead of just printing warning messages, if the orphan list is
corrupted, declare the file system is corrupted. If there are any
reserved inodes in the orphaned inode list, declare the file system
corrupted and stop right away to avoid doing more potential damage to
the file system.Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o -
If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly). Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.This can be reproduced via:
mke2fs -t ext4 /tmp/foo.img 100
debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
mount -o loop /tmp/foo.img /mnt(But don't do this if you are using an unpatched kernel if you care
about the system staying functional. :-)This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1]. (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)[1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf
Cc: stable@vger.kernel.org
Reported by: Vegard Nossum
Signed-off-by: Theodore Ts'o
10 Mar, 2016
1 commit
-
Signed-off-by: Adam Buchbinder
Signed-off-by: Theodore Ts'o
12 Feb, 2016
1 commit
-
When block group checksum is wrong, we call ext4_error() while holding
group spinlock from ext4_init_block_bitmap() or
ext4_init_inode_bitmap() which results in scheduling while in atomic.
Fix the issue by calling ext4_error() later after dropping the spinlock.CC: stable@vger.kernel.org
Reported-by: Dmitry Vyukov
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
Reviewed-by: Darrick J. Wong
09 Jan, 2016
1 commit
-
Signed-off-by: Li Xi
Signed-off-by: Theodore Ts'o
Reviewed-by: Andreas Dilger
Reviewed-by: Jan Kara
18 Oct, 2015
3 commits
-
Make the bitmap reaading routines return real error codes (EIO,
EFSCORRUPTED, EFSBADCRC) which can then be reflected back to
userspace for more precise diagnosis work.In particular, this means that mballoc no longer claims that we're out
of memory if the block bitmaps become corrupt.Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o -
Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros. Furthermore, clean out the
places where we open-coded feature tests.Signed-off-by: Darrick J. Wong
-
Instead of overloading EIO for CRC errors and corrupt structures,
return the same error codes that XFS returns for the same issues.Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o
24 Jul, 2015
1 commit
-
dquot_initialize() can now return error. Handle it where possible.
Acked-by: Theodore Ts'o
Signed-off-by: Jan Kara
01 Jun, 2015
1 commit
-
Factor out calls to ext4_inherit_context() and move them to
__ext4_new_inode(); this fixes a problem where ext4_tmpfile() wasn't
calling calling ext4_inherit_context(), so the temporary file wasn't
getting protected. Since the blocks for the tmpfile could end up on
disk, they really should be protected if the tmpfile is created within
the context of an encrypted directory.Signed-off-by: Theodore Ts'o
19 May, 2015
1 commit
-
The superblock fields s_file_encryption_mode and s_dir_encryption_mode
are vestigal, so remove them as a cleanup. While we're at it, allow
file systems with both encryption and inline_data enabled at the same
time to work correctly. We can't have encrypted inodes with inline
data, but there's no reason to prohibit unencrypted inodes from using
the inline data feature.Signed-off-by: Theodore Ts'o
27 Apr, 2015
1 commit
-
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
16 Apr, 2015
2 commits
-
Also add the test dummy encryption mode flag so we can more easily
test the encryption patches using xfstests.Signed-off-by: Michael Halcrow
Signed-off-by: Theodore Ts'o -
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells
Signed-off-by: Al Viro
12 Apr, 2015
2 commits
-
Signed-off-by: Uday Savagaonkar
Signed-off-by: Ildar Muslukhov
Signed-off-by: Michael Halcrow
Signed-off-by: Theodore Ts'o -
Pulls block_write_begin() into fs/ext4/inode.c because it might need
to do a low-level read of the existing data, in which case we need to
decrypt it.Signed-off-by: Michael Halcrow
Signed-off-by: Ildar Muslukhov
Signed-off-by: Theodore Ts'o
03 Apr, 2015
1 commit
-
Remove unused header files and header files which are included in
ext4.h.Signed-off-by: Sheng Yong
Signed-off-by: Theodore Ts'o
30 Oct, 2014
1 commit
-
When we fail to load block bitmap in __ext4_new_inode() we will
dereference NULL pointer in ext4_journal_get_write_access(). So check
for error from ext4_read_block_bitmap().Coverity-id: 989065
Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
13 Oct, 2014
1 commit
-
Besides the fact that this replacement improves code readability
it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
which may result attempt to use uninitialized csum machinery.#Testcase_BEGIN
IMG=/dev/ram0
MNT=/mnt
mkfs.ext4 $IMG
mount $IMG $MNT
#Enable feature directly on disk, on mounted fs
tune2fs -O metadata_csum $IMG
# Provoke metadata update, likey result in OOPS
touch $MNT/test
umount $MNT
#Testcase_END# Replacement script
@@
expression E;
@@
- EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
+ ext4_has_metadata_csum(E)https://bugzilla.kernel.org/show_bug.cgi?id=82201
Signed-off-by: Dmitry Monakhov
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
13 Jul, 2014
1 commit
-
Fix potential null pointer dereferencing problem caused by e43bb4e612
("ext4: decrement free clusters/inodes counters when block group declared bad")Reported-by: Dan Carpenter
Signed-off-by: Namjae Jeon
Signed-off-by: Ashish Sangwan
Signed-off-by: Theodore Ts'o
Reviewed-by: Lukas Czerner
06 Jul, 2014
1 commit
-
The first time that we allocate from an uninitialized inode allocation
bitmap, if the block allocation bitmap is also uninitalized, we need
to get write access to the block group descriptor before we start
modifying the block group descriptor flags and updating the free block
count, etc. Otherwise, there is the potential of a bad journal
checksum (if journal checksums are enabled), and of the file system
becoming inconsistent if we crash at exactly the wrong time.Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
26 Jun, 2014
1 commit
-
We should decrement free clusters counter when block bitmap is marked
as corrupt and free inodes counter when the allocation bitmap is
marked as corrupt to avoid misunderstanding due to incorrect available
size in statfs result. User can get immediately ENOSPC error from
write begin without reaching for the writepages.Cc: Darrick J. Wong
Reported-by: Amit Sahrawat
Signed-off-by: Namjae Jeon
Signed-off-by: Ashish Sangwan
08 Nov, 2013
1 commit
-
Many of the uses of get_random_bytes() do not actually need
cryptographically secure random numbers. Replace those uses with a
call to prandom_u32(), which is faster and which doesn't consume
entropy from the /dev/random driver.Signed-off-by: "Theodore Ts'o"
29 Aug, 2013
2 commits
-
If the group descriptor fails validation, mark the whole blockgroup
corrupt so that the inode/block allocators skip this group. The
previous approach takes the risk of writing to a damaged group
descriptor; hopefully it was never the case that the [ib]bitmap fields
pointed to another valid block and got dirtied, since the memset would
fill the page with 1s.Signed-off-by: Darrick J. Wong
Signed-off-by: "Theodore Ts'o" -
If we detect either a discrepancy between the inode bitmap and the
inode counts or the inode bitmap fails to pass validation checks, mark
the block group corrupt and refuse to allocate or deallocate inodes
from the group.Signed-off-by: Darrick J. Wong
Signed-off-by: "Theodore Ts'o"
17 Aug, 2013
1 commit
-
In no journal mode, if an inode has recently been deleted, we
shouldn't reuse it right away. Otherwise it's possible, after an
unclean shutdown, to hit a situation where a recently deleted inode
gets reused for some other purpose before the inode table block has
been written to disk. However, if the directory entry has been
updated, then the directory entry will be pointing at the old inode
contents.E2fsck will make sure the file system is consistent after the
unclean shutdown. However, if the recently deleted inode is a
character mode device, or an inode with the immutable bit set, even
after the file system has been fixed up by e2fsck, it can be
possible for a *.pyc file to be pointing at a character mode
device, and when python tries to open the *.pyc file, Hilarity
Ensues. We could change all of userspace to be very suspicious
about stat'ing files before opening them, and clearing the
immutable flag if necessary --- or we can just avoid reusing an
inode number if it has been recently deleted.Google-Bug-Id: 10017573
Signed-off-by: "Theodore Ts'o"
27 Jul, 2013
1 commit
-
When we try to allocate an inode, and there is a race between two
CPU's trying to grab the same inode, _and_ this inode is the last free
inode in the block group, make sure the group number is bumped before
we continue searching the rest of the block groups. Otherwise, we end
up searching the current block group twice, and we end up skipping
searching the last block group. So in the unlikely situation where
almost all of the inodes are allocated, it's possible that we will
return ENOSPC even though there might be free inodes in that last
block group.Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org
05 Jun, 2013
1 commit
-
Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
21 Apr, 2013
1 commit
-
As Dave Chinner pointed out at the 2013 LSF/MM workshop, it's
important that metadata I/O requests are marked as such to avoid
priority inversions caused by I/O bandwidth throttling.Signed-off-by: "Theodore Ts'o"
20 Apr, 2013
1 commit
-
Inode allocation transaction is pretty heavy (246 credits with quotas
and extents before previous patch, still around 200 after it). This is
mostly due to credits required for allocation of quota structures
(credits there are heavily overestimated but it's difficult to make
better estimates if we don't want to wire non-trivial assumptions about
quota format into filesystem).So move quota initialization out of allocation transaction. That way
transaction for quota structure allocation will be started only if we
need to look up quota structure on disk (rare) and furthermore it will
be started for each quota type separately, not for all of them at once.
This reduces maximum transaction size to 34 is most cases and to 73 in
the worst case.[ Modified by tytso to clean up the cleanup paths for error handling.
Also use a separate call to ext4_std_error() for each failure so it
is easier for someone who is debugging a problem in this function to
determine which function call failed. ]Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
10 Apr, 2013
1 commit
-
This patch should fix sparse complains about shadow declatations.
Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"
12 Mar, 2013
1 commit
-
A user who was using a 8TB+ file system and with a very large flexbg
size (> 65536) could cause the atomic_t used in the struct flex_groups
to overflow. This was detected by PaX security patchset:http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551
This bug was introduced in commit 9f24e4208f7e, so it's been around
since 2.6.30. :-(Fix this by using an atomic64_t for struct orlav_stats's
free_clusters.Signed-off-by: "Theodore Ts'o"
Reviewed-by: Lukas Czerner
Cc: stable@vger.kernel.org
15 Feb, 2013
1 commit
-
Some messages printed related to a WARN_ON(1) were printed using
KERN_NOTICE. Use KERN_WARNING or ext4_warning() instead so that
context related to the WARN_ON() is printed at the same printk warning
level (and log files, etc.)Signed-off-by: "Theodore Ts'o"
10 Feb, 2013
1 commit
-
In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle
until the inode has been succesfully allocated. In order to do this,
we need to start the handle in the ext4_new_inode(). So create a new
variant of this function, ext4_new_inode_start_handle(), so the handle
can be created at the last possible minute, before we need to modify
the inode allocation bitmap block.Signed-off-by: "Theodore Ts'o"
09 Feb, 2013
1 commit
-
So we can better understand what bits of ext4 are responsible for
long-running jbd2 handles, use jbd2__journal_start() so we can pass
context information for logging purposes.The recommended way for finding the longer-running handles is:
T=/sys/kernel/debug/tracing
EVENT=$T/events/jbd2/jbd2_handle_stats
echo "interval > 5" > $EVENT/filter
echo 1 > $EVENT/enable./run-my-fs-benchmark
cat $T/trace > /tmp/problem-handles
This will list handles that were active for longer than 20ms. Having
longer-running handles is bad, because a commit started at the wrong
time could stall for those 20+ milliseconds, which could delay an
fsync() or an O_SYNC operation. Here is an example line from the
trace file describing a handle which lived on for 311 jiffies, or over
1.2 seconds:postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
dirtied_blocks 0Signed-off-by: "Theodore Ts'o"
11 Dec, 2012
1 commit
-
Signed-off-by: Tao Ma
Signed-off-by: "Theodore Ts'o"