Eric Lee / smarc-fsl-linux-kernel

04 Feb, 2018

1 commit

30ac846da xfs: log recovery should replay deferred ops in order ... Browse Code »

[ Upstream commit 509955823cc9cc225c05673b1b83d70ca70c5c60 ]

As part of testing log recovery with dm_log_writes, Amir Goldstein
discovered an error in the deferred ops recovery that lead to corruption
of the filesystem metadata if a reflink+rmap filesystem happened to shut
down midway through a CoW remap:

"This is what happens [after failed log recovery]:

"Phase 1 - find and verify superblock...
"Phase 2 - using internal log
" - zero log...
" - scan filesystem freespace and inode maps...
" - found root inode chunk
"Phase 3 - for each AG...
" - scan (but don't clear) agi unlinked lists...
" - process known inodes and perform inode discovery...
" - agno = 0
"data fork in regular inode 134 claims CoW block 376
"correcting nextents for inode 134
"bad data fork in inode 134
"would have cleared inode 134"

Hou Tao dissected the log contents of exactly such a crash:

"According to the implementation of xfs_defer_finish(), these ops should
be completed in the following sequence:

"Have been done:
"(1) CUI: Oper (160)
"(2) BUI: Oper (161)
"(3) CUD: Oper (194), for CUI Oper (160)
"(4) RUI A: Oper (197), free rmap [0x155, 2, -9]

"Should be done:
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI A
"(8) RUD: for RUI B

"Actually be done by xlog_recover_process_intents()
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI B
"(8) RUD: for RUI A

"So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
from the log record in post_mount.log (generated after umount) and the trace
print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
entry [0x155, 2, -9] are freed."

When reconstructing the internal log state from the log items found on
disk, it's required that deferred ops replay in exactly the same order
that they would have had the filesystem not gone down. However,
replaying unfinished deferred ops can create /more/ deferred ops. These
new deferred ops are finished in the wrong order. This causes fs
corruption and replay crashes, so let's create a single defer_ops to
handle the subsequent ops created during replay, then use one single
transaction at the end of log recovery to ensure that everything is
replayed in the same order as they're supposed to be.

Reported-by: Amir Goldstein
Analyzed-by: Hou Tao
Reviewed-by: Christoph Hellwig
Tested-by: Amir Goldstein
Signed-off-by: Darrick J. Wong
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2018-02-04 00:39:06 +0800

20 Dec, 2017

1 commit

742b570da xfs: fix log block underflow during recovery cycle verification ... Browse Code »

[ Upstream commit 9f2a4505800607e537e9dd9dea4f55c4b0c30c7a ]

It is possible for mkfs to format very small filesystems with too
small of an internal log with respect to the various minimum size
and block count requirements. If this occurs when the log happens to
be smaller than the scan window used for cycle verification and the
scan wraps the end of the log, the start_blk calculation in
xlog_find_head() underflows and leads to an attempt to scan an
invalid range of log blocks. This results in log recovery failure
and a failed mount.

Since there may be filesystems out in the wild with this kind of
geometry, we cannot simply refuse to mount. Instead, cap the scan
window for cycle verification to the size of the physical log. This
ensures that the cycle verification proceeds as expected when the
scan wraps the end of the log.

Reported-by: Zorro Lang
Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Brian Foster
2017-12-20 17:10:31 +0800

02 Sep, 2017

1 commit

7bf7a193a xfs: fix compiler warnings ... Browse Code »

Fix up all the compiler warnings that have crept in.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-09-02 23:22:19 +0800

23 Aug, 2017

5 commits

e67d3d424 xfs: add log recovery tracepoint for head/tail ... Browse Code »

Torn write detection and tail overwrite detection can shift the log
head and tail respectively in the event of CRC mismatch or
corruption errors. Add a high-level log recovery tracepoint to dump
the final log head/tail and make those values easily attainable in
debug/diagnostic situations.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2017-08-23 00:22:24 +0800
a4c9b34d6 xfs: handle -EFSCORRUPTED during head/tail verification ... Browse Code »

Torn write and tail overwrite detection both trigger only on
-EFSBADCRC errors. While this is the most likely failure scenario
for each condition, -EFSCORRUPTED is still possible in certain cases
depending on what ends up on disk when a torn write or partial tail
overwrite occurs. For example, an invalid log record h_len can lead
to an -EFSCORRUPTED error when running the log recovery CRC pass.

Therefore, update log head and tail verification to trigger the
associated head/tail fixups in the event of -EFSCORRUPTED errors
along with -EFSBADCRC. Also, -EFSCORRUPTED can currently be returned
from xlog_do_recovery_pass() before rhead_blk is initialized if the
first record encountered happens to be corrupted. This leads to an
incorrect 'first_bad' return value. Initialize rhead_blk earlier in
the function to address that problem as well.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2017-08-23 00:22:24 +0800
4a4f66eac xfs: fix log recovery corruption error due to tail overwrite ... Browse Code »

If we consider the case where the tail (T) of the log is pinned long
enough for the head (H) to push and block behind the tail, we can
end up blocked in the following state without enough free space (f)
in the log to satisfy a transaction reservation:

0 phys. log N
[-------HffT---H'--T'---]

The last good record in the log (before H) refers to T. The tail
eventually pushes forward (T') leaving more free space in the log
for writes to H. At this point, suppose space frees up in the log
for the maximum of 8 in-core log buffers to start flushing out to
the log. If this pushes the head from H to H', these next writes
overwrite the previous tail T. This is safe because the items logged
from T to T' have been written back and removed from the AIL.

If the next log writes (H -> H') happen to fail and result in
partial records in the log, the filesystem shuts down having
overwritten T with invalid data. Log recovery correctly locates H on
the subsequent mount, but H still refers to the now corrupted tail
T. This results in log corruption errors and recovery failure.

Since the tail overwrite results from otherwise correct runtime
behavior, it is up to log recovery to try and deal with this
situation. Update log recovery tail verification to run a CRC pass
from the first record past the tail to the head. This facilitates
error detection at T and moves the recovery tail to the first good
record past H' (similar to truncating the head on torn write
detection). If corruption is detected beyond the range possibly
affected by the max number of iclogs, the log is legitimately
corrupted and log recovery failure is expected.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2017-08-23 00:22:23 +0800
5297ac1f6 xfs: always verify the log tail during recovery ... Browse Code »

Log tail verification currently only occurs when torn writes are
detected at the head of the log. This was introduced because a
change in the head block due to torn writes can lead to a change in
the tail block (each log record header references the current tail)
and the tail block should be verified before log recovery proceeds.

Tail corruption is possible outside of torn write scenarios,
however. For example, partial log writes can be detected and cleared
during the initial head/tail block discovery process. If the partial
write coincides with a tail overwrite, the log tail is corrupted and
recovery fails.

To facilitate correct handling of log tail overwites, update log
recovery to always perform tail verification. This is necessary to
detect potential tail overwrite conditions when torn writes may not
have occurred. This changes normal (i.e., no torn writes) recovery
behavior slightly to detect and return CRC related errors near the
tail before actual recovery starts.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2017-08-23 00:22:23 +0800
284f1c2c9 xfs: fix recovery failure when log record header wraps log end ... Browse Code »

The high-level log recovery algorithm consists of two loops that
walk the physical log and process log records from the tail to the
head. The first loop handles the case where the tail is beyond the
head and processes records up to the end of the physical log. The
subsequent loop processes records from the beginning of the physical
log to the head.

Because log records can wrap around the end of the physical log, the
first loop mentioned above must handle this case appropriately.
Records are processed from in-core buffers, which means that this
algorithm must split the reads of such records into two partial
I/Os: 1.) from the beginning of the record to the end of the log and
2.) from the beginning of the log to the end of the record. This is
further complicated by the fact that the log record header and log
record data are read into independent buffers.

The current handling of each buffer correctly splits the reads when
either the header or data starts before the end of the log and wraps
around the end. The data read does not correctly handle the case
where the prior header read wrapped or ends on the physical log end
boundary. blk_no is incremented to or beyond the log end after the
header read to point to the record data, but the split data read
logic triggers, attempts to read from an invalid log block and
ultimately causes log recovery to fail. This can be reproduced
fairly reliably via xfstests tests generic/047 and generic/388 with
large iclog sizes (256k) and small (10M) logs.

If the record header read has pushed beyond the end of the physical
log, the subsequent data read is actually contiguous. Update the
data read logic to detect the case where blk_no has wrapped, mod it
against the log size to read from the correct address and issue one
contiguous read for the log data buffer. The log record is processed
as normal from the buffer(s), the loop exits after the current
iteration and the subsequent loop picks up with the first new record
after the start of the log.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2017-08-23 00:22:23 +0800

11 Jul, 2017

1 commit

642338ba3 Merge tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull XFS updates from Darrick Wong:
"Here are some changes for you for 4.13. For the most part it's fixes
for bugs and deadlock problems, and preparation for online fsck in
some future merge window.

- Avoid quotacheck deadlocks

- Fix transaction overflows when bunmapping fragmented files

- Refactor directory readahead

- Allow admin to configure if ASSERT is fatal

- Improve transaction usage detail logging during overflows

- Minor cleanups

- Don't leak log items when the log shuts down

- Remove double-underscore typedefs

- Various preparation for online scrubbing

- Introduce new error injection configuration sysfs knobs

- Refactor dq_get_next to use extent map directly

- Fix problems with iterating the page cache for unwritten data

- Implement SEEK_{HOLE,DATA} via iomap

- Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA

- Don't use MAXPATHLEN to check on-disk symlink target lengths"

* tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
xfs: don't crash on unexpected holes in dir/attr btrees
xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
xfs: fix contiguous dquot chunk iteration livelock
xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
vfs: Add iomap_seek_hole and iomap_seek_data helpers
vfs: Add page_cache_seek_hole_data helper
xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent
xfs: Check for m_errortag initialization in xfs_errortag_test
xfs: grab dquots without taking the ilock
xfs: fix semicolon.cocci warnings
xfs: Don't clear SGID when inheriting ACLs
xfs: free cowblocks and retry on buffered write ENOSPC
xfs: replace log_badcrc_factor knob with error injection tag
xfs: convert drop_writes to use the errortag mechanism
xfs: remove unneeded parameter from XFS_TEST_ERROR
xfs: expose errortag knobs via sysfs
xfs: make errortag a per-mountpoint structure
xfs: free uncommitted transactions during log recovery
xfs: don't allow bmap on rt files
...

Linus Torvalds
2017-07-11 01:51:53 +0800

25 Jun, 2017

1 commit

39775431f xfs: free uncommitted transactions during log recovery ... Browse Code »

Log recovery allocates in-core transaction and member item data
structures on-demand as it processes the on-disk log. Transactions
are allocated on first encounter on-disk and stored in a hash table
structure where they are easily accessible for subsequent lookups.
Transaction items are also allocated on demand and are attached to
the associated transactions.

When a commit record is encountered in the log, the transaction is
committed to the fs and the in-core structures are freed. If a
filesystem crashes or shuts down before all in-core log buffers are
flushed to the log, however, not all transactions may have commit
records in the log. As expected, the modifications in such an
incomplete transaction are not replayed to the fs. The in-core data
structures for the partial transaction are never freed, however,
resulting in a memory leak.

Update xlog_do_recovery_pass() to first correctly initialize the
hash table array so empty lists can be distinguished from populated
lists on function exit. Update xlog_recover_free_trans() to always
remove the transaction from the list prior to freeing the associated
memory. Finally, walk the hash table of transaction lists as the
last step before it goes out of scope and free any transactions that
may remain on the lists. This prevents a memory leak of partial
transactions in the log.

Signed-off-by: Brian Foster
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2017-06-25 01:11:41 +0800

20 Jun, 2017

1 commit

c8ce540db xfs: remove double-underscore integer types ... Browse Code »

This is a purely mechanical patch that removes the private
__{u,}int{8,16,32,64}_t typedefs in favor of using the system
{u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform
the transformation and fix the resulting whitespace and indentation
errors:

s/typedef\t__uint8_t/typedef __uint8_t\t/g
s/typedef\t__uint/typedef __uint/g
s/typedef\t__int$[0-9]*$_t/typedef int\1_t\t/g
s/__uint8_t\t/__uint8_t\t\t/g
s/__uint/uint/g
s/__int$[0-9]*$_t\t/__int\1_t\t\t/g
s/__int/int/g
/^typedef.*int[0-9]*_t;$/d

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-06-20 05:11:33 +0800

05 Jun, 2017

1 commit

d905fdaaa xfs: use the common helper uuid_is_null() ... Browse Code »

Use the common helper uuid_is_null() and remove the xfs specific
helper uuid_is_nil().

The common helper does not check for the NULL pointer value as
xfs helper did, but xfs code never calls the helper with a pointer
that can be NULL.

Conform comments and warning strings to use the term 'null uuid'
instead of 'nil uuid', because this is the terminology used by
lib/uuid.c and its users. It is also the terminology used in
userspace by libuuid and xfsprogs.

Signed-off-by: Amir Goldstein
[hch: remove now unused uuid.[ch]]
Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Andy Shevchenko

Amir Goldstein
2017-06-05 22:59:08 +0800

09 May, 2017

1 commit

6e7c2b4dd scripts/spelling.txt: add "intialise(d)" pattern and fix typo instances ... Browse Code »

Fix typos and add the following to the scripts/spelling.txt:

intialisation||initialisation
intialised||initialised
intialise||initialise

This commit does not intend to change the British spelling itself.

Link: http://lkml.kernel.org/r/1481573103-11329-18-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2017-05-09 08:15:13 +0800

07 Dec, 2016

1 commit

a444d72e6 Merge branch 'xfs-4.10-misc-fixes-3' into for-next Browse Code »

Dave Chinner
2016-12-07 14:42:30 +0800

05 Dec, 2016

3 commits

cae028df5 xfs: optimise CRC updates ... Browse Code »

Nick Piggin reported that the CRC overhead in an fsync heavy
workload was higher than expected on a Power8 machine. Part of this
was to do with the fact that the power8 CRC implementation is not
efficient for CRC lengths of less than 512 bytes, and so the way we
split the CRCs over the CRC field means a lot of the CRCs are
reduced to being less than than optimal size.

To optimise this, change the CRC update mechanism to zero the CRC
field first, and then compute the CRC in one pass over the buffer
and write the result back into the buffer. We can do this safely
because anything writing a CRC has exclusive access to the buffer
the CRC is being calculated over.

We leave the CRC verify code the same - it still splits the CRC
calculation - because we do not want read-only operations modifying
the underlying buffer. This is because read-only operations may not
have an exclusive access to the buffer guaranteed, and so temporary
modifications could leak out to to other processes accessing the
buffer concurrently.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-12-05 11:40:32 +0800
200237d67 xfs: Move AGI buffer type setting to xfs_read_agi ... Browse Code »

We've missed properly setting the buffer type for
an AGI transaction in 3 spots now, so just move it
into xfs_read_agi() and set it if we are in a transaction
to avoid the problem in the future.

This is similar to how it is done in i.e. the dir3
and attr3 read functions.

Signed-off-by: Eric Sandeen
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Eric Sandeen
2016-12-05 09:31:31 +0800
6b10b23ca xfs: set AGI buffer type in xlog_recover_clear_agi_bucket ... Browse Code »

xlog_recover_clear_agi_bucket didn't set the
type to XFS_BLFT_AGI_BUF, so we got a warning during log
replay (or an ASSERT on a debug build).

XFS (md0): Unknown buffer type 0!
XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1

Fix this, as was done in f19b872b for 2 other locations
with the same problem.

cc: # 3.10 to current
Signed-off-by: Eric Sandeen
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Eric Sandeen
2016-12-05 09:31:06 +0800

08 Nov, 2016

1 commit

755c7bf5d libxfs: convert ushort to unsigned short ... Browse Code »

Since xfsprogs dropped ushort in favor of unsigned short, do that
here too.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-11-08 08:55:48 +0800

05 Oct, 2016

2 commits

17c12bcd3 xfs: when replaying bmap operations, don't let unlinked inodes get reaped ... Browse Code »

Log recovery will iget an inode to replay BUI items and iput the inode
when it's done. Unfortunately, if the inode was unlinked, the iput
will see that i_nlink == 0 and decide to truncate & free the inode,
which prevents us from replaying subsequent BUIs. We can't skip the
BUIs because we have to replay all the redo items to ensure that
atomic operations complete.

Since unlinked inode recovery will reap the inode anyway, we can
safely introduce a new inode flag to indicate that an inode is in this
'unlinked recovery' state and should not be auto-reaped in the
drop_inode path.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2016-10-05 02:05:44 +0800
77d61fe45 xfs: log bmap intent items ... Browse Code »

Provide a mechanism for higher levels to create BUI/BUD items, submit
them to the log, and a stub function to deal with recovered BUI items.
These parts will be connected to the rmapbt in a later patch.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2016-10-05 02:05:44 +0800

04 Oct, 2016

2 commits

a90c00f05 xfs: add refcount btree block detection to log recovery ... Browse Code »

Identify refcountbt blocks in the log correctly so that we can
validate them during log recovery.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2016-10-04 00:11:23 +0800
f997ee213 xfs: log refcount intent items ... Browse Code »

Provide a mechanism for higher levels to create CUI/CUD items, submit
them to the log, and a stub function to deal with recovered CUI items.
These parts will be connected to the refcountbt in a later patch.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2016-10-04 00:11:21 +0800

26 Sep, 2016

5 commits

5cd9cee98 xfs: log recovery tracepoints to track current lsn and buffer submission ... Browse Code »

Log recovery has particular rules around buffer submission along with
tricky corner cases where independent transactions can share an LSN. As
such, it can be difficult to follow when/why buffers are submitted
during recovery.

Add a couple tracepoints to post the current LSN of a record when a new
record is being processed and when a buffer is being skipped due to LSN
ordering. Also, update the recover item class to include the LSN of the
current transaction for the item being processed.

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Brian Foster
2016-09-26 06:34:52 +0800
60a4a2225 xfs: update metadata LSN in buffers during log recovery ... Browse Code »

Log recovery is currently broken for v5 superblocks in that it never
updates the metadata LSN of buffers written out during recovery. The
metadata LSN is recorded in various bits of metadata to provide recovery
ordering criteria that prevents transient corruption states reported by
buffer write verifiers. Without such ordering logic, buffer updates can
be replayed out of order and lead to false positive transient corruption
states. This is generally not a corruption vector on its own, but
corruption detection shuts down the filesystem and ultimately prevents a
mount if it occurs during log recovery. This requires an xfs_repair run
that clears the log and potentially loses filesystem updates.

This problem is avoided in most cases as metadata writes during normal
filesystem operation update the metadata LSN appropriately. The problem
with log recovery not updating metadata LSNs manifests if the system
happens to crash shortly after log recovery itself. In this scenario, it
is possible for log recovery to complete all metadata I/O such that the
filesystem is consistent. If a crash occurs after that point but before
the log tail is pushed forward by subsequent operations, however, the
next mount performs the same log recovery over again. If a buffer is
updated multiple times in the dirty range of the log, an earlier update
in the log might not be valid based on the current state of the
associated buffer after all of the updates in the log had been replayed
(before the previous crash). If a verifier happens to detect such a
problem, the filesystem claims corruption and immediately shuts down.

This commonly manifests in practice as directory block verifier failures
such as the following, likely due to directory verifiers being
particularly detailed in their checks as compared to most others:

...
Mounting V5 Filesystem
XFS (dm-0): Starting recovery (logdev: internal)
XFS (dm-0): Internal error XFS_WANT_CORRUPTED_RETURN at line ... of \
file fs/xfs/libxfs/xfs_dir2_data.c. Caller xfs_dir3_data_verify ...
...

Update log recovery to update the metadata LSN of recovered buffers.
Since metadata LSNs are already updated by write verifer functions via
attached log items, attach a dummy log item to the buffer during
validation and explicitly set the LSN of the current transaction. This
ensures that the metadata LSN of a buffer is updated based on whether
the recovery I/O actually completes, and if so, that subsequent recovery
attempts identify that the buffer is already up to date with respect to
the current transaction.

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Brian Foster
2016-09-26 06:34:27 +0800
040c52c0a xfs: don't warn on buffers not being recovered due to LSN ... Browse Code »

The log recovery buffer validation function is invoked in cases where a
buffer update may be skipped due to LSN ordering. If the validation
function happens to come across directory conversion situations (e.g., a
dir3 block to data conversion), it may warn about seeing a buffer log
format of one type and a buffer with a magic number of another.

This warning is not valid as the buffer update is ultimately skipped.
This is indicated by a current_lsn of NULLCOMMITLSN provided by the
caller. As such, update xlog_recover_validate_buf_type() to only warn in
such cases when a buffer update is expected.

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Brian Foster
2016-09-26 06:32:50 +0800
22db9af24 xfs: pass current lsn to log recovery buffer validation ... Browse Code »

The current LSN must be available to the buffer validation function to
provide the ability to update the metadata LSN of the buffer. Pass the
current_lsn value down to xlog_recover_validate_buf_type() in
preparation.

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Brian Foster
2016-09-26 06:32:07 +0800
12818d24d xfs: rework log recovery to submit buffers on LSN boundaries ... Browse Code »

The fix to log recovery to update the metadata LSN in recovered buffers
introduces the requirement that a buffer is submitted only once per
current LSN. Log recovery currently submits buffers on transaction
boundaries. This is not sufficient as the abstraction between log
records and transactions allows for various scenarios where multiple
transactions can share the same current LSN. If independent transactions
share an LSN and both modify the same buffer, log recovery can
incorrectly skip updates and leave the filesystem in an inconsisent
state.

In preparation for proper metadata LSN updates during log recovery,
update log recovery to submit buffers for write on LSN change boundaries
rather than transaction boundaries. Explicitly track the current LSN in
a new struct xlog field to handle the various corner cases of when the
current LSN may or may not change.

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Brian Foster
2016-09-26 06:22:16 +0800

03 Aug, 2016

6 commits

722e25177 xfs: remove the extents array from the rmap update done log item ... Browse Code »

Nothing ever uses the extent array in the rmap update done redo
item, so remove it before it is fixed in the on-disk log format.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-08-03 10:28:43 +0800
a650e8f98 xfs: add rmap btree block detection to log recovery ... Browse Code »

Originally-From: Dave Chinner

So such blocks can be correctly identified and have their operations
structures attached to validate recovery has not resulted in a
correct block.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-08-03 10:17:11 +0800
9e88b5d86 xfs: log rmap intent items ... Browse Code »

Provide a mechanism for higher levels to create RUI/RUD items, submit
them to the log, and a stub function to deal with recovered RUI items.
These parts will be connected to the rmapbt in a later patch.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-08-03 10:09:48 +0800
525488520 xfs: rmap btree requires more reserved free space ... Browse Code »

Originally-From: Dave Chinner

The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functions to handle this.

Also, because the macros are now executing conditional code and are
called quite frequently, convert them to functions that initialise
variables in the struct xfs_mount, use the new variables everywhere
and document the calculations better.

[darrick.wong@oracle.com: don't reserve blocks if !rmap]
[dchinner@redhat.com: update m_ag_max_usable after growfs]

Signed-off-by: Dave Chinner
Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-08-03 09:38:24 +0800
dc42375d5 xfs: refactor redo intent item processing ... Browse Code »

Refactor the EFI intent item recovery (and cancellation) functions
into a general function that scans the AIL and an intent item type
specific handler. Move the function that recovers a single EFI item
into the extent free item code. We'll want the generalized function
when we start wiring up more redo item types.

Furthermore, ensure that log recovery only replays the redo items
that were in the AIL prior to recovery by checking the item LSN
against the largest LSN seen during log scanning. As written this
should never happen, but we can be defensive anyway.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-08-03 09:23:49 +0800
3ab78df2a xfs: rework xfs_bmap_free callers to use xfs_defer_ops ... Browse Code »

Restructure everything that used xfs_bmap_free to use xfs_defer_ops
instead. For now we'll just remove the old symbols and play some
cpp magic to make it work; in the next patch we'll actually rename
everything.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-08-03 09:15:38 +0800

20 May, 2016

1 commit

2a4ad5894 Merge branch 'xfs-4.7-misc-fixes' into for-next Browse Code »

Dave Chinner
2016-05-20 08:33:17 +0800

06 Apr, 2016

2 commits

664b60f6b xfs: improve kmem_realloc ... Browse Code »

Use krealloc to implement our realloc function. This helps to avoid
new allocations if we are still in the slab bucket. At least for the
bmap btree root that's actually the common case.

This also allows removing the now unused oldsize argument.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-04-06 07:47:01 +0800
253f4911f xfs: better xfs_trans_alloc interface ... Browse Code »

Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
that returns a transaction with all the required log and block reservations,
and which allows passing transaction flags directly to avoid the cumbersome
_xfs_trans_alloc interface.

While we're at it we also get rid of the transaction type argument that has
been superflous since we stopped supporting the non-CIL logging mode. The
guts of it will be removed in another patch.

[dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-04-06 07:19:55 +0800

09 Mar, 2016

1 commit

ab9d1e4f7 Merge branch 'xfs-misc-fixes-4.6-3' into for-next Browse Code »

Dave Chinner
2016-03-09 05:18:30 +0800

07 Mar, 2016

3 commits

7f0ed5461 Merge branch 'xfs-buf-macro-cleanup-4.6' into for-next Browse Code »

Dave Chinner
2016-03-07 06:31:00 +0800
a2bbcb60f Merge branch 'xfs-gut-icdinode-4.6' into for-next Browse Code »

Dave Chinner
2016-03-07 06:30:32 +0800
c53473be4 Merge branch 'xfs-rt-fixes-4.6' into for-next Browse Code »

Dave Chinner
2016-03-07 06:29:04 +0800