Commit 666d644cd72a9ec58b353209ff191d7430f3b357

Authored by Dave Chinner
Committed by Ben Myers
1 parent 3d6e036193

xfs: don't free EFIs before the EFDs are committed

Filesystems are occasionally being shut down with this error:

xfs_trans_ail_delete_bulk: attempting to delete a log item that is
not in the AIL.

It was diagnosed to be related to the EFI/EFD commit order when the
EFI and EFD are in different checkpoints and the EFD is committed
before the EFI here:

http://oss.sgi.com/archives/xfs/2013-01/msg00082.html

The real problem is that a single bit cannot fully describe the
states that the EFI/EFD processing can be in. These completion
states are:

EFI			EFI in AIL	EFD		Result
committed/unpinned	Yes		committed	OK
committed/pinned	No		committed	Shutdown
uncommitted		No		committed	Shutdown


Note that the "result" field is what should happen, not what does
happen. The current logic is broken and handles the first two cases
correctly by luck.  That is, the code will free the EFI if the
XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
inverted logic "works" because if both EFI and EFD are committed,
then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
bit, and the second frees the EFI item. Hence as long as
xfs_efi_item_committed() has been called, everything appears to be
fine.

It is the third case where the logic fails - where
xfs_efd_item_committed() is called before xfs_efi_item_committed(),
and that results in the EFI being freed before it has been
committed. That is the bug that triggered the shutdown, and hence
keeping track of whether the EFI has been committed or not is
insufficient to correctly order the EFI/EFD operations w.r.t. the
AIL.

What we really want is this: the EFI is always placed into the
AIL before the last reference goes away. The only way to guarantee
that is that the EFI is not freed until after it has been unpinned
*and* the EFD has been committed. That is, restructure the logic so
that the only case that can occur is the first case.

This can be done easily by replacing the XFS_EFI_COMMITTED with an
EFI reference count. The EFI is initialised with it's own count, and
that is not released until it is unpinned. However, there is a
complication to this method - the high level EFI/EFD code in
xfs_bmap_finish() does not hold direct references to the EFI
structure, and runs a transaction commit between the EFI and EFD
processing. Hence the EFI can be freed even before the EFD is
created using such a method.

Further, log recovery uses the AIL for tracking EFI/EFDs that need
to be recovered, but it uses the AIL *differently* to the EFI
transaction commit. Hence log recovery never pins or unpins EFIs, so
we can't drop the EFI reference count indirectly to free the EFI.

However, this doesn't prevent us from using a reference count here.
There is a 1:1 relationship between EFIs and EFDs, so when we
initialise the EFI we can take a reference count for the EFD as
well. This solves the xfs_bmap_finish() issue - the EFI will never
be freed until the EFD is processed. In terms of log recovery,
during the committing of the EFD we can look for the
XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
thereby ensuring everything works correctly there as well.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

Showing 3 changed files with 23 additions and 19 deletions Side-by-side Diff

fs/xfs/xfs_extfree_item.c
... ... @@ -50,9 +50,8 @@
50 50 * Freeing the efi requires that we remove it from the AIL if it has already
51 51 * been placed there. However, the EFI may not yet have been placed in the AIL
52 52 * when called by xfs_efi_release() from EFD processing due to the ordering of
53   - * committed vs unpin operations in bulk insert operations. Hence the
54   - * test_and_clear_bit(XFS_EFI_COMMITTED) to ensure only the last caller frees
55   - * the EFI.
  53 + * committed vs unpin operations in bulk insert operations. Hence the reference
  54 + * count to ensure only the last caller frees the EFI.
56 55 */
57 56 STATIC void
58 57 __xfs_efi_release(
... ... @@ -60,7 +59,7 @@
60 59 {
61 60 struct xfs_ail *ailp = efip->efi_item.li_ailp;
62 61  
63   - if (!test_and_clear_bit(XFS_EFI_COMMITTED, &efip->efi_flags)) {
  62 + if (atomic_dec_and_test(&efip->efi_refcount)) {
64 63 spin_lock(&ailp->xa_lock);
65 64 /* xfs_trans_ail_delete() drops the AIL lock. */
66 65 xfs_trans_ail_delete(ailp, &efip->efi_item,
... ... @@ -126,8 +125,8 @@
126 125 * which the EFI is manipulated during a transaction. If we are being asked to
127 126 * remove the EFI it's because the transaction has been cancelled and by
128 127 * definition that means the EFI cannot be in the AIL so remove it from the
129   - * transaction and free it. Otherwise coordinate with xfs_efi_release() (via
130   - * XFS_EFI_COMMITTED) to determine who gets to free the EFI.
  128 + * transaction and free it. Otherwise coordinate with xfs_efi_release()
  129 + * to determine who gets to free the EFI.
131 130 */
132 131 STATIC void
133 132 xfs_efi_item_unpin(
134 133  
... ... @@ -171,19 +170,13 @@
171 170  
172 171 /*
173 172 * The EFI is logged only once and cannot be moved in the log, so simply return
174   - * the lsn at which it's been logged. For bulk transaction committed
175   - * processing, the EFI may be processed but not yet unpinned prior to the EFD
176   - * being processed. Set the XFS_EFI_COMMITTED flag so this case can be detected
177   - * when processing the EFD.
  173 + * the lsn at which it's been logged.
178 174 */
179 175 STATIC xfs_lsn_t
180 176 xfs_efi_item_committed(
181 177 struct xfs_log_item *lip,
182 178 xfs_lsn_t lsn)
183 179 {
184   - struct xfs_efi_log_item *efip = EFI_ITEM(lip);
185   -
186   - set_bit(XFS_EFI_COMMITTED, &efip->efi_flags);
187 180 return lsn;
188 181 }
189 182  
... ... @@ -241,6 +234,7 @@
241 234 efip->efi_format.efi_nextents = nextents;
242 235 efip->efi_format.efi_id = (__psint_t)(void*)efip;
243 236 atomic_set(&efip->efi_next_extent, 0);
  237 + atomic_set(&efip->efi_refcount, 2);
244 238  
245 239 return efip;
246 240 }
247 241  
... ... @@ -310,8 +304,13 @@
310 304 uint nextents)
311 305 {
312 306 ASSERT(atomic_read(&efip->efi_next_extent) >= nextents);
313   - if (atomic_sub_and_test(nextents, &efip->efi_next_extent))
  307 + if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) {
314 308 __xfs_efi_release(efip);
  309 +
  310 + /* recovery needs us to drop the EFI reference, too */
  311 + if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
  312 + __xfs_efi_release(efip);
  313 + }
315 314 }
316 315  
317 316 static inline struct xfs_efd_log_item *EFD_ITEM(struct xfs_log_item *lip)
fs/xfs/xfs_extfree_item.h
... ... @@ -114,16 +114,20 @@
114 114 * Define EFI flag bits. Manipulated by set/clear/test_bit operators.
115 115 */
116 116 #define XFS_EFI_RECOVERED 1
117   -#define XFS_EFI_COMMITTED 2
118 117  
119 118 /*
120   - * This is the "extent free intention" log item. It is used
121   - * to log the fact that some extents need to be free. It is
122   - * used in conjunction with the "extent free done" log item
123   - * described below.
  119 + * This is the "extent free intention" log item. It is used to log the fact
  120 + * that some extents need to be free. It is used in conjunction with the
  121 + * "extent free done" log item described below.
  122 + *
  123 + * The EFI is reference counted so that it is not freed prior to both the EFI
  124 + * and EFD being committed and unpinned. This ensures that when the last
  125 + * reference goes away the EFI will always be in the AIL as it has been
  126 + * unpinned, regardless of whether the EFD is processed before or after the EFI.
124 127 */
125 128 typedef struct xfs_efi_log_item {
126 129 xfs_log_item_t efi_item;
  130 + atomic_t efi_refcount;
127 131 atomic_t efi_next_extent;
128 132 unsigned long efi_flags; /* misc flags */
129 133 xfs_efi_log_format_t efi_format;
fs/xfs/xfs_log_recover.c
... ... @@ -2948,6 +2948,7 @@
2948 2948 * This will pull the EFI from the AIL and
2949 2949 * free the memory associated with it.
2950 2950 */
  2951 + set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
2951 2952 xfs_efi_release(efip, efip->efi_format.efi_nextents);
2952 2953 return XFS_ERROR(EIO);
2953 2954 }