Eric Lee / smarc-fsl-linux-kernel

04 Jun, 2020

2 commits

dfcd4489e ext4: drop ext4_journal_free_reserved() ... Browse Code »

Remove ext4_journal_free_reserved() function. It is never used.

Signed-off-by: Jan Kara
Reviewed-by: Andreas Dilger
Link: https://lore.kernel.org/r/20200520133119.1383-2-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2020-06-04 11:16:53 +0800
4209ae12b ext4: handle ext4_mark_inode_dirty errors ... Browse Code »

ext4_mark_inode_dirty() can fail for real reasons. Ignoring its return
value may lead ext4 to ignore real failures that would result in
corruption / crashes. Harden ext4_mark_inode_dirty error paths to fail
as soon as possible and return errors to the caller whenever
appropriate.

One of the possible scnearios when this bug could affected is that
while creating a new inode, its directory entry gets added
successfully but while writing the inode itself mark_inode_dirty
returns error which is ignored. This would result in inconsistency
that the directory entry points to a non-existent inode.

Ran gce-xfstests smoke tests and verified that there were no
regressions.

Signed-off-by: Harshad Shirwadkar
Link: https://lore.kernel.org/r/20200427013438.219117-1-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o

Harshad Shirwadkar
2020-06-04 11:16:50 +0800

26 Mar, 2020

1 commit

c8980e198 ext4: disable dioread_nolock whenever delayed allocation is disabled ... Browse Code »

The patch "ext4: make dioread_nolock the default" (244adf6426ee) causes
generic/422 to fail when run in kvm-xfstests' ext3conv test case. This
applies both the dioread_nolock and nodelalloc mount options, a
combination not previously tested by kvm-xfstests. The failure occurs
because the dioread_nolock code path splits a previously fallocated
multiblock extent into a series of single block extents when overwriting
a portion of that extent. That causes allocation of an extent tree leaf
node and a reshuffling of extents. Once writeback is completed, the
individual extents are recombined into a single extent, the extent is
moved again, and the leaf node is deleted. The difference in block
utilization before and after writeback due to the leaf node triggers the
failure.

The original reason for this behavior was to avoid ENOSPC when handling
I/O completions during writeback in the dioread_nolock code paths when
delayed allocation is disabled. It may no longer be necessary, because
code was added in the past to reserve extra space to solve this problem
when delayed allocation is enabled, and this code may also apply when
delayed allocation is disabled. Until this can be verified, don't use
the dioread_nolock code paths if delayed allocation is disabled.

Signed-off-by: Eric Whitney
Link: https://lore.kernel.org/r/20200319150028.24592-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o

Eric Whitney
2020-03-26 22:57:42 +0800

18 Jan, 2020

1 commit

46797ad75 ext4: uninline ext4_inode_journal_mode() ... Browse Code »

Determining an inode's journaling mode has gotten more complicated over
time. Move ext4_inode_journal_mode() from an inline function into
ext4_jbd2.c to reduce the compiled code size.

Signed-off-by: Eric Biggers
Link: https://lore.kernel.org/r/20191209233602.117778-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Eric Biggers
2020-01-18 05:24:52 +0800

06 Nov, 2019

5 commits

83448bdfb ext4: Reserve revoke credits for freed blocks ... Browse Code »

So far we have reserved only relatively high fixed amount of revoke
credits for each transaction. We over-reserved by large amount for most
cases but when freeing large directories or files with data journalling,
the fixed amount is not enough. In fact the worst case estimate is
inconveniently large (maximum extent size) for freeing of one extent.

We fix this by doing proper estimate of the amount of blocks that need
to be revoked when removing blocks from the inode due to truncate or
hole punching and otherwise reserve just a small amount of revoke
credits for each transaction to accommodate freeing of xattrs block or
so.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:49 +0800
fdc3ef882 jbd2: Reserve space for revoke descriptor blocks ... Browse Code »

Extend functions for starting, extending, and restarting transaction
handles to take number of revoke records handle must be able to
accommodate. These functions then make sure transaction has enough
credits to be able to store resulting revoke descriptor blocks. Also
revoke code tracks number of revoke records created by a handle to catch
situation where some place didn't reserve enough space for revoke
records. Similarly to standard transaction credits, space for unused
reserved revoke records is released when the handle is stopped.

On the ext4 side we currently take a simplistic approach of reserving
space for 1024 revoke records for any transaction. This grows amount of
credits reserved for each handle only by a few and is enough for any
normal workload so that we don't hit warnings in jbd2. We will refine
the logic in following commits.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
a9a8344ee ext4, jbd2: Provide accessor function for handle credits ... Browse Code »

Provide accessor function to get number of credits available in a handle
and use it from ext4. Later, computation of available credits won't be
so straightforward.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-11-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
a41303679 ext4: Provide function to handle transaction restarts ... Browse Code »

Provide ext4_journal_ensure_credits_fn() function to ensure transaction
has given amount of credits and call helper function to prepare for
restarting a transaction. This allows to remove some boilerplate code
from various places, add proper error handling for the case where
transaction extension or restart fails, and reduces following changes
needed for proper revoke record reservation tracking.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-10-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
321238fbf ext4: Fix ext4_should_journal_data() for EA inodes ... Browse Code »

Similarly to directories, EA inodes do only journalled modifications to
their data. Change ext4_should_journal_data() to return true for them so
that we don't have to special-case them during truncate.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-7-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:47 +0800

21 Jun, 2019

1 commit

73131fbb0 ext4: use jbd2_inode dirty range scoping ... Browse Code »

Use the newly introduced jbd2_inode dirty range scoping to prevent us
from waiting forever when trying to complete a journal transaction.

Signed-off-by: Ross Zwisler
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara
Cc: stable@vger.kernel.org

Ross Zwisler
2019-06-21 05:26:26 +0800

25 Mar, 2019

1 commit

17403fa27 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for 5.1"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: prohibit fstrim in norecovery mode
ext4: cleanup bh release code in ext4_ind_remove_space()
ext4: brelse all indirect buffer in ext4_ind_remove_space()
ext4: report real fs size after failed resize
ext4: add missing brelse() in add_new_gdb_meta_bg()
ext4: remove useless ext4_pin_inode()
ext4: avoid panic during forced reboot
ext4: fix data corruption caused by unaligned direct AIO
ext4: fix NULL pointer dereference while journal is aborted

Linus Torvalds
2019-03-25 04:41:37 +0800

15 Mar, 2019

1 commit

fa30dde38 ext4: fix NULL pointer dereference while journal is aborted ... Browse Code »

We see the following NULL pointer dereference while running xfstests
generic/475:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
RIP: 0010:ext4_do_update_inode+0x4ec/0x760
...
Call Trace:
? jbd2_journal_get_write_access+0x42/0x50
? __ext4_journal_get_write_access+0x2c/0x70
? ext4_truncate+0x186/0x3f0
ext4_mark_iloc_dirty+0x61/0x80
ext4_mark_inode_dirty+0x62/0x1b0
ext4_truncate+0x186/0x3f0
? unmap_mapping_pages+0x56/0x100
ext4_setattr+0x817/0x8b0
notify_change+0x1df/0x430
do_truncate+0x5e/0x90
? generic_permission+0x12b/0x1a0

This is triggered because the NULL pointer handle->h_transaction was
dereferenced in function ext4_update_inode_fsync_trans().
I found that the h_transaction was set to NULL in jbd2__journal_restart
but failed to attached to a new transaction while the journal is aborted.

Fix this by checking the handle before updating the inode.

Fixes: b436b9bef84d ("ext4: Wait for proper transaction commit on fsync")
Signed-off-by: Jiufei Xue
Signed-off-by: Theodore Ts'o
Reviewed-by: Joseph Qi
Cc: stable@kernel.org

Jiufei Xue
2019-03-15 11:19:22 +0800

24 Jan, 2019

1 commit

592ddec75 ext4: use IS_ENCRYPTED() to check encryption status ... Browse Code »

This commit removes the ext4 specific ext4_encrypted_inode() and makes
use of the generic IS_ENCRYPTED() macro to check for the encryption
status of an inode.

Reviewed-by: Eric Biggers
Signed-off-by: Chandan Rajendra
Signed-off-by: Eric Biggers

Chandan Rajendra
2019-01-24 12:56:43 +0800

18 Dec, 2017

1 commit

f51667685 ext4: fix up remaining files with SPDX cleanups ... Browse Code »

A number of ext4 source files were skipped due because their copyright
permission statements didn't match the expected text used by the
automated conversion utilities. I've added SPDX tags for the rest.

While looking at some of these files, I've noticed that we have quite
a bit of variation on the licenses that were used --- in particular
some of the Red Hat licenses on the jbd2 files use a GPL2+ license,
and we have some files that have a LGPL-2.1 license (which was quite
surprising).

I've not attempted to do any license changes. Even if it is perfectly
legal to relicense to GPL 2.0-only for consistency's sake, that should
be done with ext4 developer community discussion.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2017-12-18 11:00:59 +0800

06 Aug, 2017

1 commit

c03b45b85 ext4, project: expand inode extra size if possible ... Browse Code »

When upgrading from old format, try to set project id
to old file first time, it will return EOVERFLOW, but if
that file is dirtied(touch etc), changing project id will
be allowed, this might be confusing for users, we could
try to expand @i_extra_isize here too.

Reported-by: Zhang Yi
Signed-off-by: Miao Xie
Signed-off-by: Wang Shilong
Signed-off-by: Theodore Ts'o

Miao Xie
2017-08-06 13:00:49 +0800

22 Jun, 2017

2 commits

c1a5d5f6a ext4: improve journal credit handling in set xattr paths ... Browse Code »

Both ext4_set_acl() and ext4_set_context() need to be made aware of
ea_inode feature when it comes to credits calculation.

Also add a sufficient credits check in ext4_xattr_set_handle() right
after xattr write lock is grabbed. Original credits calculation is done
outside the lock so there is a possiblity that the initially calculated
credits are not sufficient anymore.

Signed-off-by: Tahsin Erdogan
Signed-off-by: Theodore Ts'o

Tahsin Erdogan
2017-06-22 10:28:40 +0800
e08ac99fa ext4: add largedir feature ... Browse Code »

This INCOMPAT_LARGEDIR feature allows larger directories to be created
in ldiskfs, both with directory sizes over 2GB and and a maximum htree
depth of 3 instead of the current limit of 2. These features are needed
in order to exceed the current limit of approximately 10M entries in a
single directory.

This patch was originally written by Yang Sheng to support the Lustre server.

[ Bumped the credits needed to update an indexed directory -- tytso ]

Signed-off-by: Liang Zhen
Signed-off-by: Yang Sheng
Signed-off-by: Artem Blagodarenko
Signed-off-by: Theodore Ts'o
Reviewed-by: Andreas Dilger

Artem Blagodarenko
2017-06-22 09:09:57 +0800

11 Dec, 2016

1 commit

73b92a2a5 ext4: do not perform data journaling when data is encrypted ... Browse Code »

Currently data journalling is incompatible with encryption: enabling both
at the same time has never been supported by design, and would result in
unpredictable behavior. However, users are not precluded from turning on
both features simultaneously. This change programmatically replaces data
journaling for encrypted regular files with ordered data journaling mode.

Background:
Journaling encrypted data has not been supported because it operates on
buffer heads of the page in the page cache. Namely, when the commit
happens, which could be up to five seconds after caching, the commit
thread uses the buffer heads attached to the page to copy the contents of
the page to the journal. With encryption, it would have been required to
keep the bounce buffer with ciphertext for up to the aforementioned five
seconds, since the page cache can only hold plaintext and could not be
used for journaling. Alternatively, it would be required to setup the
journal to initiate a callback at the commit time to perform deferred
encryption - in this case, not only would the data have to be written
twice, but it would also have to be encrypted twice. This level of
complexity was not justified for a mode that in practice is very rarely
used because of the overhead from the data journalling.

Solution:
If data=journaled has been set as a mount option for a filesystem, or if
journaling is enabled on a regular file, do not perform journaling if the
file is also encrypted, instead fall back to the data=ordered mode for the
file.

Rationale:
The intent is to allow seamless and proper filesystem operation when
journaling and encryption have both been enabled, and have these two
conflicting features gracefully resolved by the filesystem.

Fixes: 4461471107b7
Signed-off-by: Sergey Karamov
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org

Sergey Karamov
2016-12-11 06:54:58 +0800

27 Jun, 2016

1 commit

d08854f5b ext4: optimize ext4_should_retry_alloc() to improve ENOSPC performance ... Browse Code »

If there are no pending blocks to be released after a commit, forcing
a journal commit has no hope of helping. It's possible that a commit
had just completed, so if there are now free blocks available for
allocation, it's worth retrying the commit.

Reported-by: Chao Yu
Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-06-27 06:24:01 +0800

24 Apr, 2016

2 commits

ee0876bc6 ext4: do not ask jbd2 to write data for delalloc buffers ... Browse Code »

Currently we ask jbd2 to write all dirty allocated buffers before
committing a transaction when doing writeback of delay allocated blocks.
However this is unnecessary since we move all pages to writeback state
before dropping a transaction handle and then submit all the necessary
IO. We still need the transaction commit to wait for all the outstanding
writeback before flushing disk caches during transaction commit to avoid
data exposure issues though. Use the new jbd2 capability and ask it to
only wait for outstanding writeback during transaction commit when
writing back data in ext4_writepages().

Tested-by: "HUANG Weller (CM/ESW12-CN)"
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-04-24 12:56:08 +0800
41617e1a8 jbd2: add support for avoiding data writes during transaction commits ... Browse Code »

Currently when filesystem needs to make sure data is on permanent
storage before committing a transaction it adds inode to transaction's
inode list. During transaction commit, jbd2 writes back all dirty
buffers that have allocated underlying blocks and waits for the IO to
finish. However when doing writeback for delayed allocated data, we
allocate blocks and immediately submit the data. Thus asking jbd2 to
write dirty pages just unnecessarily adds more work to jbd2 possibly
writing back other redirtied blocks.

Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
outstanding data writes before committing a transaction and thus avoid
unnecessary writes.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-04-24 12:56:07 +0800

18 Oct, 2015

1 commit

e2b911c53 ext4: clean up feature test macros with predicate functions ... Browse Code »

Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros. Furthermore, clean out the
places where we open-coded feature tests.

Signed-off-by: Darrick J. Wong

Darrick J. Wong
2015-10-18 04:18:43 +0800

11 Sep, 2014

1 commit

a2d4a646e ext4: don't use MAXQUOTAS value ... Browse Code »

MAXQUOTAS value defines maximum number of quota types VFS supports.
This isn't necessarily the number of types ext4 supports. Although
ext4 will support project quotas, use ext4 private definition for
consistency with other filesystems.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2014-09-11 23:15:15 +0800

12 May, 2014

1 commit

c197855ea ext4: make local functions static ... Browse Code »

I have been running make namespacecheck to look for unneeded globals, and
found these in ext4.

Signed-off-by: Stephen Hemminger
Signed-off-by: "Theodore Ts'o"

Stephen Hemminger
2014-05-12 22:50:23 +0800

29 Aug, 2013

1 commit

70261f568 ext4: Fix misspellings using 'codespell' tool ... Browse Code »

Signed-off-by: Anatol Pomozov
Signed-off-by: "Theodore Ts'o"

Anatol Pomozov
2013-08-29 02:40:12 +0800

05 Jun, 2013

2 commits

6b523df4f ext4: use transaction reservation for extent conversion in ext4_end_io ... Browse Code »

Later we would like to clear PageWriteback bit only after extent
conversion from unwritten to written extents is performed. However it
is not possible to start a transaction after PageWriteback is set
because that violates lock ordering (and is easy to deadlock). So we
have to reserve a transaction before locking pages and sending them
for IO and later we use the transaction for extent conversion from
ext4_end_io().

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 01:21:11 +0800
5fe2fe895 ext4: provide wrappers for transaction reservation calls ... Browse Code »

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 00:37:50 +0800

10 Apr, 2013

1 commit

f45a5ef91 ext4: improve credit estimate for EXT4_SINGLEDATA_TRANS_BLOCKS ... Browse Code »

Estimate of 27 credits for allocation of a block in extent based inode
is unnecessarily high. We can easily argue 20 is enough.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-04-10 00:39:26 +0800

04 Apr, 2013

1 commit

5d3ee2085 ext4: fix journal callback list traversal ... Browse Code »

It is incorrect to use list_for_each_entry_safe() for journal callback
traversial because ->next may be removed by other task:
->ext4_mb_free_metadata()
->ext4_mb_free_metadata()
->ext4_journal_callback_del()

This results in the following issue:

WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107
Call Trace:
[] warn_slowpath_common+0xad/0xf0
[] warn_slowpath_fmt+0x46/0x50
[] ? ext4_journal_commit_callback+0x99/0xc0
[] __list_del_entry+0x1c0/0x250
[] ext4_journal_commit_callback+0x6f/0xc0
[] jbd2_journal_commit_transaction+0x23a6/0x2570
[] ? try_to_del_timer_sync+0x82/0xa0
[] ? del_timer_sync+0x91/0x1e0
[] kjournald2+0x19f/0x6a0
[] ? wake_up_bit+0x40/0x40
[] ? bit_spin_lock+0x80/0x80
[] kthread+0x10e/0x120
[] ? __init_kthread_worker+0x70/0x70
[] ret_from_fork+0x7c/0xb0
[] ? __init_kthread_worker+0x70/0x70

This patch fix the issue as follows:
- ext4_journal_commit_callback() make list truly traversial safe
simply by always starting from list_head
- fix race between two ext4_journal_callback_del() and
ext4_journal_callback_try_del()

Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Jan Kara
Cc: stable@vger.kernel.com

Dmitry Monakhov
2013-04-04 10:08:52 +0800

10 Feb, 2013

2 commits

95eaefbde ext4: fix the number of credits needed for acl ops with inline data ... Browse Code »

Operations which modify extended attributes may need extra journal
credits if inline data is used, since there is a chance that some
extended attributes may need to get pushed to an external attribute
block.

Changes to reflect this was made in xattr.c, but they were missed in
fs/ext4/acl.c. To fix this, abstract the calculation of the number of
credits needed for xattr operations to an inline function defined in
ext4_jbd2.h, and use it in acl.c and xattr.c.

Also move the function declarations used in inline.c from xattr.h
(where they are non-obviously hidden, and caused problems since
ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
them to ext4.h.

Signed-off-by: "Theodore Ts'o"
Reviewed-by: Tao Ma
Reviewed-by: Jan Kara

Theodore Ts'o
2013-02-10 04:23:03 +0800
64044abf0 ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir() ... Browse Code »

The ext4_unlink() and ext4_rmdir() don't actually release the blocks
associated with the file/directory. This gets done in a separate jbd2
handle called via ext4_evict_inode(). Thus, we don't need to reserve
lots of journal credits for the truncate.

Note that using too many journal credits is non-optimal because it can
leading to the journal transmit getting closed too early, before it is
strictly necessary.

Signed-off-by: "Theodore Ts'o"
Reviewed-by: Jan Kara

Theodore Ts'o
2013-02-10 04:06:24 +0800

09 Feb, 2013

1 commit

9924a92a8 ext4: pass context information to jbd2__journal_start() ... Browse Code »

So we can better understand what bits of ext4 are responsible for
long-running jbd2 handles, use jbd2__journal_start() so we can pass
context information for logging purposes.

The recommended way for finding the longer-running handles is:

T=/sys/kernel/debug/tracing
EVENT=$T/events/jbd2/jbd2_handle_stats
echo "interval > 5" > $EVENT/filter
echo 1 > $EVENT/enable

./run-my-fs-benchmark

cat $T/trace > /tmp/problem-handles

This will list handles that were active for longer than 20ms. Having
longer-running handles is bad, because a commit started at the wrong
time could stall for those 20+ milliseconds, which could delay an
fsync() or an O_SYNC operation. Here is an example line from the
trace file describing a handle which lived on for 311 jiffies, or over
1.2 seconds:

postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
dirtied_blocks 0

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2013-02-09 10:59:22 +0800

09 Nov, 2012

1 commit

37be2f59d ext4: remove ext4_handle_release_buffer() ... Browse Code »

ext4_handle_release_buffer() was intended to remove journal
write access from a buffer, but it doesn't actually do anything
at all other than add a BUFFER_TRACE point, but it's not reliably
used for that either. Remove all the associated dead code.

Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Carlos Maiolino

Eric Sandeen
2012-11-09 00:22:46 +0800

23 Jul, 2012

2 commits

b50924c2c ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() ... Browse Code »

The '__ext4_handle_dirty_metadata()' does not need the 'now' argument
anymore and we can kill it.

Signed-off-by: Artem Bityutskiy
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Jan Kara

Artem Bityutskiy
2012-07-23 08:37:31 +0800
7c319d328 ext4: make quota as first class supported feature ... Browse Code »

This patch adds support for quotas as a first class feature in ext4;
which is to say, the quota files are stored in hidden inodes as file
system metadata, instead of as separate files visible in the file system
directory hierarchy.

It is based on the proposal at:
https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4

This patch introduces a new feature - EXT4_FEATURE_RO_COMPAT_QUOTA
which, when turned on, enables quota accounting at mount time
iteself. Also, the quota inodes are stored in two additional superblock
fields. Some changes introduced by this patch that should be pointed
out are:

1) Two new ext4-superblock fields - s_usr_quota_inum and
s_grp_quota_inum for storing the quota inodes in use.
2) Default quota inodes are: inode#3 for tracking userquota and inode#4
for tracking group quota. The superblock fields can be set to use
other inodes as well.
3) If the QUOTA feature and corresponding quota inodes are set in
superblock, the quota usage tracking is turned on at mount time. On
'quotaon' ioctl, the quota limits enforcement is turned
on. 'quotaoff' ioctl turns off only the limits enforcement in this
case.
4) When QUOTA feature is in use, the quota mount options 'quota',
'usrquota', 'grpquota' are ignored by the kernel.
5) mke2fs or tune2fs can be used to set the QUOTA feature and initialize
quota inodes. The default reserved inodes will not be visible to user
as regular files.
6) The quota-tools will need to be modified to support hidden quota
files on ext4. E2fsprogs will also include support for creating and
fixing quota files.
7) Support is only for the new V2 quota file format.

Tested-by: Jan Kara
Reviewed-by: Jan Kara
Reviewed-by: Johann Lombardi
Signed-off-by: Aditya Kali
Signed-off-by: "Theodore Ts'o"

Aditya Kali
2012-07-23 08:21:31 +0800

30 Apr, 2012

1 commit

a9c473178 ext4: calculate and verify superblock checksum ... Browse Code »

Calculate and verify the superblock checksum. Since the UUID and
block group number are embedded in each copy of the superblock, we
need only checksum the entire block. Refactor some of the code to
eliminate open-coding of the checksum update call.

Signed-off-by: Darrick J. Wong
Signed-off-by: "Theodore Ts'o"

Darrick J. Wong
2012-04-30 06:29:10 +0800

21 Feb, 2012

2 commits

18aadd47f ext4: expand commit callback and ... Browse Code »

The per-commit callback was used by mballoc code to manage free space
bitmaps after deleted blocks have been released. This patch expands
it to support multiple different callbacks, to allow other things to
be done after the commit has been completed.

Signed-off-by: Bobi Jam
Signed-off-by: Andreas Dilger
Signed-off-by: "Theodore Ts'o"

Bobi Jam
2012-02-21 06:53:02 +0800
3d2b15826 ext4: ignore EXT4_INODE_JOURNAL_DATA flag with delalloc ... Browse Code »
1

Ext4 does not support data journalling with delayed allocation enabled.
We even do not allow to mount the file system with delayed allocation
and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
system mounted with delayed allocation (default) and that's where
problem arises. The easies way to reproduce this problem is with the
following set of commands:

mkfs.ext4 /dev/sdd
mount /dev/sdd /mnt/test1
dd if=/dev/zero of=/mnt/test1/file bs=1M count=4
chattr +j /mnt/test1/file
dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc
chattr -j /mnt/test1/file

Additionally it can be reproduced quite reliably with xfstests 272 and
269. In fact the above reproducer is a part of test 272.

To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
the file system is mounted with delayed allocation. This can be easily
done by fixing ext4_should_*_data() functions do ignore data journal
flag when delalloc is set (suggested by Ted). We also have to set the
appropriate address space operations for the inode (again, ignoring data
journal flag if delalloc enabled).

Additionally this commit introduces ext4_inode_journal_mode() function
because ext4_should_*_data() has already had a lot of common code and
this change is putting it all into one function so it is easier to
read.

Successfully tested with xfstests in following configurations:

delalloc + data=ordered
delalloc + data=writeback
data=journal
nodelalloc + data=ordered
nodelalloc + data=writeback
nodelalloc + data=journal

Signed-off-by: Lukas Czerner
Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org

Lukas Czerner
2012-02-21 06:53:00 +0800

13 Aug, 2011

1 commit

441c85085 ext4: Fix ext4_should_writeback_data() for no-journal mode ... Browse Code »
1

ext4_should_writeback_data() had an incorrect sequence of
tests to determine if it should return 0 or 1: in
particular, even in no-journal mode, 0 was being returned
for a non-regular-file inode.

This meant that, in non-journal mode, we would use
ext4_journalled_aops for directories, symlinks, and other
non-regular files. However, calling journalled aop
callbacks when there is no valid handle, can cause problems.

This would cause a kernel crash with Jan Kara's commit
2d859db3e4 ("ext4: fix data corruption in inodes with
journalled data"), because we now dereference 'handle' in
ext4_journalled_write_end().

I also added BUG_ONs to check for a valid handle in the
obviously journal-only aops callbacks.

I tested this running xfstests with a scratch device in
these modes:

- no-journal
- data=ordered
- data=writeback
- data=journal

All work fine; the data=journal run has many failures and a
crash in xfstests 074, but this is no different from a
vanilla kernel.

Signed-off-by: Curt Wohlgemuth
Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Curt Wohlgemuth
2011-08-13 23:25:18 +0800

09 May, 2011

1 commit

2cd05cc39 ext4: remove unneeded ext4_journal_get_undo_access ... Browse Code »

The block allocation code used to use jbd2_journal_get_undo_access as
a way to make changes that wouldn't show up until the commit took
place. The new multi-block allocation code has a its own way of
preventing newly freed blocks from getting reused until the commit
takes place (it avoids updating the buddy bitmaps until the commit is
done), so we don't need to use jbd2_journal_get_undo_access(), which
has extra overhead compared to jbd2_journal_get_write_access().

There was one last vestigal use of ext4_journal_get_undo_access() in
ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
and then remove the ext4_journal_get_undo_access() support.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2011-05-09 22:58:45 +0800