Commit f25f624263445785b94f39739a6339ba9ed3275d
1 parent
4c4d390122
Exists in
master
and in
7 other branches
ext3: Avoid filesystem corruption after a crash under heavy delete load
It can happen that ext3_free_branches calls ext3_forget() for an indirect block in an earlier transaction than a transaction in which we clear pointer to this indirect block. Thus if we crash before a transaction clearing the block pointer is committed, we will see indirect block pointing to already freed blocks and complain during orphan list cleanup. The fix is simple: Make sure ext3_forget() is called in the transaction doing block pointer clearing. This is a backport of an ext4 fix by Amir G. <amir73il@users.sourceforge.net> Signed-off-by: Jan Kara <jack@suse.cz>
Showing 1 changed file with 25 additions and 21 deletions Side-by-side Diff
fs/ext3/inode.c
... | ... | @@ -2270,27 +2270,6 @@ |
2270 | 2270 | depth); |
2271 | 2271 | |
2272 | 2272 | /* |
2273 | - * We've probably journalled the indirect block several | |
2274 | - * times during the truncate. But it's no longer | |
2275 | - * needed and we now drop it from the transaction via | |
2276 | - * journal_revoke(). | |
2277 | - * | |
2278 | - * That's easy if it's exclusively part of this | |
2279 | - * transaction. But if it's part of the committing | |
2280 | - * transaction then journal_forget() will simply | |
2281 | - * brelse() it. That means that if the underlying | |
2282 | - * block is reallocated in ext3_get_block(), | |
2283 | - * unmap_underlying_metadata() will find this block | |
2284 | - * and will try to get rid of it. damn, damn. | |
2285 | - * | |
2286 | - * If this block has already been committed to the | |
2287 | - * journal, a revoke record will be written. And | |
2288 | - * revoke records must be emitted *before* clearing | |
2289 | - * this block's bit in the bitmaps. | |
2290 | - */ | |
2291 | - ext3_forget(handle, 1, inode, bh, bh->b_blocknr); | |
2292 | - | |
2293 | - /* | |
2294 | 2273 | * Everything below this this pointer has been |
2295 | 2274 | * released. Now let this top-of-subtree go. |
2296 | 2275 | * |
... | ... | @@ -2312,6 +2291,31 @@ |
2312 | 2291 | ext3_mark_inode_dirty(handle, inode); |
2313 | 2292 | truncate_restart_transaction(handle, inode); |
2314 | 2293 | } |
2294 | + | |
2295 | + /* | |
2296 | + * We've probably journalled the indirect block several | |
2297 | + * times during the truncate. But it's no longer | |
2298 | + * needed and we now drop it from the transaction via | |
2299 | + * journal_revoke(). | |
2300 | + * | |
2301 | + * That's easy if it's exclusively part of this | |
2302 | + * transaction. But if it's part of the committing | |
2303 | + * transaction then journal_forget() will simply | |
2304 | + * brelse() it. That means that if the underlying | |
2305 | + * block is reallocated in ext3_get_block(), | |
2306 | + * unmap_underlying_metadata() will find this block | |
2307 | + * and will try to get rid of it. damn, damn. Thus | |
2308 | + * we don't allow a block to be reallocated until | |
2309 | + * a transaction freeing it has fully committed. | |
2310 | + * | |
2311 | + * We also have to make sure journal replay after a | |
2312 | + * crash does not overwrite non-journaled data blocks | |
2313 | + * with old metadata when the block got reallocated for | |
2314 | + * data. Thus we have to store a revoke record for a | |
2315 | + * block in the same transaction in which we free the | |
2316 | + * block. | |
2317 | + */ | |
2318 | + ext3_forget(handle, 1, inode, bh, bh->b_blocknr); | |
2315 | 2319 | |
2316 | 2320 | ext3_free_blocks(handle, inode, nr, 1); |
2317 | 2321 |