Commit 721a769c034204e8e5f6ca4ecbff249e0225333b

Authored by Jeff Mahoney
Committed by Jan Kara
1 parent 7bc9cc07ee

reiserfs: fix race with flush_used_journal_lists and flush_journal_list

There are two locks involved in managing the journal lists. The general
reiserfs_write_lock and the journal->j_flush_mutex.

While flush_journal_list is sleeping to acquire the j_flush_mutex or to
submit a block for write, it will drop the write lock. This allows
another thread to acquire the write lock and ultimately call
flush_used_journal_lists to traverse the list of journal lists and
select one for flushing. It can select the journal_list that has just
had flush_journal_list called on it in the original thread and call it
again with the same journal_list.

The second thread then drops the write lock to acquire j_flush_mutex and
the first thread reacquires it and continues execution and eventually
clears and frees the journal list before dropping j_flush_mutex and
returning.

The second thread acquires j_flush_mutex and ends up operating on a
journal_list that has already been released. If the memory hasn't
been reused, we'll soon after hit a BUG_ON because the transaction id
has already been cleared. If it's been reused, we'll crash in other
fun ways.

Since flush_journal_list will synchronize on j_flush_mutex, we can fix
the race by taking a proper reference in flush_used_journal_lists
and checking to see if it's still valid after the mutex is taken. It's
safe to iterate the list of journal lists and pick a list with
just the write lock as long as a reference is taken on the journal list
before we drop the lock. We already have code to handle whether a
transaction has been flushed already so we can use that to handle the
race and get rid of the trans_id BUG_ON.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>

Showing 1 changed file with 4 additions and 1 deletions Side-by-side Diff

fs/reiserfs/journal.c
... ... @@ -1338,7 +1338,6 @@
1338 1338 reiserfs_warning(s, "clm-2048", "called with wcount %d",
1339 1339 atomic_read(&journal->j_wcount));
1340 1340 }
1341   - BUG_ON(jl->j_trans_id == 0);
1342 1341  
1343 1342 /* if flushall == 0, the lock is already held */
1344 1343 if (flushall) {
... ... @@ -1765,6 +1764,8 @@
1765 1764 break;
1766 1765 tjl = JOURNAL_LIST_ENTRY(tjl->j_list.next);
1767 1766 }
  1767 + get_journal_list(jl);
  1768 + get_journal_list(flush_jl);
1768 1769 /* try to find a group of blocks we can flush across all the
1769 1770 ** transactions, but only bother if we've actually spanned
1770 1771 ** across multiple lists
... ... @@ -1773,6 +1774,8 @@
1773 1774 ret = kupdate_transactions(s, jl, &tjl, &trans_id, len, i);
1774 1775 }
1775 1776 flush_journal_list(s, flush_jl, 1);
  1777 + put_journal_list(s, flush_jl);
  1778 + put_journal_list(s, jl);
1776 1779 return 0;
1777 1780 }
1778 1781