mtd: driver _read() returns max_bitflips; mtd_read() returns -EUCLEAN

Linux modified the MTD driver interface in commit edbc4540 (with the same name as this commit). The effect is that calls to mtd_read will not return -EUCLEAN if the number of ECC-corrected bit errors is below a certain threshold, which defaults to the strength of the ECC. This allows -EUCLEAN to stop indicating "some bits were corrected" and begin indicating "a large number of bits were corrected, the data held in this region of flash may be lost soon". UBI makes use of this and when -EUCLEAN is returned from mtd_read it will move data to another block of flash. Without adopting this interface change UBI on U-boot attempts to move data between blocks every time a single bit is corrected using the ECC, which is a very common occurance on some devices. For some devices where bit errors are common enough, UBI can get stuck constantly moving data around because each block it attempts to use has a single bit error. This condition is hit when wear_leveling_worker attempts to move data from one PEB to another in response to an -EUCLEAN/UBI_IO_BITFLIPS error. When this happens ubi_eba_copy_leb is called to perform the data copy, and after the data is written it is read back to check its validity. If that read returns UBI_IO_BITFLIPS (in response to an MTD -EUCLEAN) then ubi_eba_copy_leb returns 1 to wear_leveling worker, which then proceeds to schedule the destination PEB for erasure. This leads to erase_worker running on the PEB, and following a successful erase wear_leveling_worker is called which begins this whole cycle all over again. The end result is that (without UBI debug output enabled) the boot appears to simply hang whilst in reality U-boot busily works away at destroying a block of the NAND flash. Debug output from this situation: UBI DBG: ensure_wear_leveling: schedule scrubbing UBI DBG: wear_leveling_worker: scrub PEB 1027 to PEB 4083 UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 1027 UBI DBG: ubi_io_read: read 4096 bytes from PEB 1027:4096 UBI DBG: ubi_eba_copy_leb: copy LEB 0:0, PEB 1027 to PEB 4083 UBI DBG: ubi_eba_copy_leb: read 1040384 bytes of data UBI DBG: ubi_io_read: read 1040384 bytes from PEB 1027:8192 UBI: fixable bit-flip detected at PEB 1027 UBI DBG: ubi_io_write_vid_hdr: write VID header to PEB 4083 UBI DBG: ubi_io_write: write 4096 bytes to PEB 4083:4096 UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 4083 UBI DBG: ubi_io_read: read 4096 bytes from PEB 4083:4096 UBI DBG: ubi_io_write: write 4096 bytes to PEB 4083:8192 UBI DBG: ubi_io_read: read 4096 bytes from PEB 4083:8192 UBI: fixable bit-flip detected at PEB 4083 UBI DBG: schedule_erase: schedule erasure of PEB 4083, EC 55, torture 0 UBI DBG: erase_worker: erase PEB 4083 EC 55 UBI DBG: sync_erase: erase PEB 4083, old EC 55 UBI DBG: do_sync_erase: erase PEB 4083 UBI DBG: sync_erase: erased PEB 4083, new EC 56 UBI DBG: ubi_io_write_ec_hdr: write EC header to PEB 4083 UBI DBG: ubi_io_write: write 4096 bytes to PEB 4083:0 UBI DBG: ensure_wear_leveling: schedule scrubbing UBI DBG: wear_leveling_worker: scrub PEB 1027 to PEB 4083 ... This patch adopts the interface change as in Linux commit edbc4540 in order to avoid such situations. Given that none of the drivers under drivers/mtd return -EUCLEAN, this should only affect those using software ECC. I have tested that it works on a board which is currently out of tree, but which I hope to be able to begin upstreaming soon. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Acked-by: Stefan Roese <sr@denx.de>

mtd: driver _read() returns max_bitflips; mtd_read() returns -EUCLEAN
Linux modified the MTD driver interface in commit edbc4540 (with the same name as this commit). The effect is that calls to mtd_read will not return -EUCLEAN if the number of ECC-corrected bit errors is below a certain threshold, which defaults to the strength of the ECC. This allows -EUCLEAN to stop indicating "some bits were corrected" and begin indicating "a large number of bits were corrected, the data held in this region of flash may be lost soon". UBI makes use of this and when -EUCLEAN is returned from mtd_read it will move data to another block of flash. Without adopting this interface change UBI on U-boot attempts to move data between blocks every time a single bit is corrected using the ECC, which is a very common occurance on some devices. For some devices where bit errors are common enough, UBI can get stuck constantly moving data around because each block it attempts to use has a single bit error. This condition is hit when wear_leveling_worker attempts to move data from one PEB to another in response to an -EUCLEAN/UBI_IO_BITFLIPS error. When this happens ubi_eba_copy_leb is called to perform the data copy, and after the data is written it is read back to check its validity. If that read returns UBI_IO_BITFLIPS (in response to an MTD -EUCLEAN) then ubi_eba_copy_leb returns 1 to wear_leveling worker, which then proceeds to schedule the destination PEB for erasure. This leads to erase_worker running on the PEB, and following a successful erase wear_leveling_worker is called which begins this whole cycle all over again. The end result is that (without UBI debug output enabled) the boot appears to simply hang whilst in reality U-boot busily works away at destroying a block of the NAND flash. Debug output from this situation: UBI DBG: ensure_wear_leveling: schedule scrubbing UBI DBG: wear_leveling_worker: scrub PEB 1027 to PEB 4083 UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 1027 UBI DBG: ubi_io_read: read 4096 bytes from PEB 1027:4096 UBI DBG: ubi_eba_copy_leb: copy LEB 0:0, PEB 1027 to PEB 4083 UBI DBG: ubi_eba_copy_leb: read 1040384 bytes of data UBI DBG: ubi_io_read: read 1040384 bytes from PEB 1027:8192 UBI: fixable bit-flip detected at PEB 1027 UBI DBG: ubi_io_write_vid_hdr: write VID header to PEB 4083 UBI DBG: ubi_io_write: write 4096 bytes to PEB 4083:4096 UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 4083 UBI DBG: ubi_io_read: read 4096 bytes from PEB 4083:4096 UBI DBG: ubi_io_write: write 4096 bytes to PEB 4083:8192 UBI DBG: ubi_io_read: read 4096 bytes from PEB 4083:8192 UBI: fixable bit-flip detected at PEB 4083 UBI DBG: schedule_erase: schedule erasure of PEB 4083, EC 55, torture 0 UBI DBG: erase_worker: erase PEB 4083 EC 55 UBI DBG: sync_erase: erase PEB 4083, old EC 55 UBI DBG: do_sync_erase: erase PEB 4083 UBI DBG: sync_erase: erased PEB 4083, new EC 56 UBI DBG: ubi_io_write_ec_hdr: write EC header to PEB 4083 UBI DBG: ubi_io_write: write 4096 bytes to PEB 4083:0 UBI DBG: ensure_wear_leveling: schedule scrubbing UBI DBG: wear_leveling_worker: scrub PEB 1027 to PEB 4083 ... This patch adopts the interface change as in Linux commit edbc4540 in order to avoid such situations. Given that none of the drivers under drivers/mtd return -EUCLEAN, this should only affect those using software ECC. I have tested that it works on a board which is currently out of tree, but which I hope to be able to begin upstreaming soon. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Acked-by: Stefan Roese <sr@denx.de>
Paul Burton · Scott Wood
1 parent b770e88a6c
Showing 5 changed files with 38 additions and 12 deletions Side-by-side Diff
drivers/mtd/mtdcore.c
drivers/mtd/mtdpart.c
drivers/mtd/nand/nand_base.c
drivers/mtd/onenand/onenand_base.c
include/linux/mtd/nand.h
@@ -217,11 +217,23 @@
 int mtd_read(struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen,
 	     u_char *buf)
 {
+	int ret_code;
 	if (from < 0 || from > mtd->size || len > mtd->size - from)
 		return -EINVAL;
 	if (!len)
 		return 0;
-	return mtd->_read(mtd, from, len, retlen, buf);
+
+	/*
+	 * In the absence of an error, drivers return a non-negative integer
+	 * representing the maximum number of bitflips that were corrected on
+	 * any one ecc region (if applicable; zero otherwise).
+	 */
+	ret_code = mtd->_read(mtd, from, len, retlen, buf);
+	if (unlikely(ret_code < 0))
+		return ret_code;
+	if (mtd->ecc_strength == 0)
+		return 0;	/* device lacks ecc */
+	return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;
 }
  
 int mtd_write(struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen,
@@ -53,12 +53,12 @@
  
 	stats = part->master->ecc_stats;
 	res = mtd_read(part->master, from + part->offset, len, retlen, buf);
-	if (unlikely(res)) {
-		if (mtd_is_bitflip(res))
-			mtd->ecc_stats.corrected += part->master->ecc_stats.corrected - stats.corrected;
-		if (mtd_is_eccerr(res))
-			mtd->ecc_stats.failed += part->master->ecc_stats.failed - stats.failed;
-	}
+	if (unlikely(mtd_is_eccerr(res)))
+		mtd->ecc_stats.failed +=
+			part->master->ecc_stats.failed - stats.failed;
+	else
+		mtd->ecc_stats.corrected +=
+			part->master->ecc_stats.corrected - stats.corrected;
 	return res;
 }
  
@@ -1238,6 +1238,7 @@
 		mtd->oobavail : mtd->oobsize;
  
 	uint8_t *bufpoi, *oob, *buf;
+	unsigned int max_bitflips = 0;
  
 	stats = mtd->ecc_stats;
  
@@ -1265,7 +1266,10 @@
  
 			chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
  
-			/* Now read the page into the buffer */
+			/*
+			 * Now read the page into the buffer.  Absent an error,
+			 * the read methods return max bitflips per ecc step.
+			 */
 			if (unlikely(ops->mode == MTD_OPS_RAW))
 				ret = chip->ecc.read_page_raw(mtd, chip, bufpoi,
 							      oob_required,
  
  
  
@@ -1284,15 +1288,19 @@
 				break;
 			}
  
+			max_bitflips = max_t(unsigned int, max_bitflips, ret);
+
 			/* Transfer not aligned data */
 			if (!aligned) {
 				if (!NAND_HAS_SUBPAGE_READ(chip) && !oob &&
 				    !(mtd->ecc_stats.failed - stats.failed) &&
-				    (ops->mode != MTD_OPS_RAW))
+				    (ops->mode != MTD_OPS_RAW)) {
 					chip->pagebuf = realpage;
-				else
+					chip->pagebuf_bitflips = ret;
+				} else {
 					/* Invalidate page cache */
 					chip->pagebuf = -1;
+				}
 				memcpy(buf, chip->buffers->databuf + col, bytes);
 			}
  
@@ -1310,6 +1318,8 @@
 		} else {
 			memcpy(buf, chip->buffers->databuf + col, bytes);
 			buf += bytes;
+			max_bitflips = max_t(unsigned int, max_bitflips,
+					     chip->pagebuf_bitflips);
 		}
  
 		readlen -= bytes;
@@ -1341,7 +1351,7 @@
 	if (mtd->ecc_stats.failed - stats.failed)
 		return -EBADMSG;
  
-	return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
+	return max_bitflips;
 }
  
 /**
@@ -969,7 +969,8 @@
 	if (mtd->ecc_stats.failed - stats.failed)
 		return -EBADMSG;
  
-	return mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
+	/* return max bitflips per ecc step; ONENANDs correct 1 bit only */
+	return mtd->ecc_stats.corrected != stats.corrected ? 1 : 0;
 }
  
 /**
@@ -464,6 +464,8 @@
  * @pagemask:		[INTERN] page number mask = number of (pages / chip) - 1
  * @pagebuf:		[INTERN] holds the pagenumber which is currently in
  *			data_buf.
+ * @pagebuf_bitflips:	[INTERN] holds the bitflip count for the page which is
+ *			currently in data_buf.
  * @subpagesize:	[INTERN] holds the subpagesize
  * @onfi_version:	[INTERN] holds the chip ONFI version (BCD encoded),
  *			non 0 if ONFI supported.
@@ -531,6 +533,7 @@
 	uint64_t chipsize;
 	int pagemask;
 	int pagebuf;
+	unsigned int pagebuf_bitflips;
 	int subpagesize;
 	uint8_t cellinfo;
 	int badblockpos;
...	...	@@ -217,11 +217,23 @@
217	217	int mtd_read(struct mtd_info mtd, loff_t from, size_t len, size_t retlen,
218	218	u_char *buf)
219	219	{
	220	+ int ret_code;
220	221	if (from < 0 \|\| from > mtd->size \|\| len > mtd->size - from)
221	222	return -EINVAL;
222	223	if (!len)
223	224	return 0;
224		- return mtd->_read(mtd, from, len, retlen, buf);
	225	+
	226	+ /*
	227	+ * In the absence of an error, drivers return a non-negative integer
	228	+ * representing the maximum number of bitflips that were corrected on
	229	+ * any one ecc region (if applicable; zero otherwise).
	230	+ */
	231	+ ret_code = mtd->_read(mtd, from, len, retlen, buf);
	232	+ if (unlikely(ret_code < 0))
	233	+ return ret_code;
	234	+ if (mtd->ecc_strength == 0)
	235	+ return 0; /* device lacks ecc */
	236	+ return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;
225	237	}
226	238
227	239	int mtd_write(struct mtd_info mtd, loff_t to, size_t len, size_t retlen,
...	...	@@ -53,12 +53,12 @@
53	53
54	54	stats = part->master->ecc_stats;
55	55	res = mtd_read(part->master, from + part->offset, len, retlen, buf);
56		- if (unlikely(res)) {
57		- if (mtd_is_bitflip(res))
58		- mtd->ecc_stats.corrected += part->master->ecc_stats.corrected - stats.corrected;
59		- if (mtd_is_eccerr(res))
60		- mtd->ecc_stats.failed += part->master->ecc_stats.failed - stats.failed;
61		- }
	56	+ if (unlikely(mtd_is_eccerr(res)))
	57	+ mtd->ecc_stats.failed +=
	58	+ part->master->ecc_stats.failed - stats.failed;
	59	+ else
	60	+ mtd->ecc_stats.corrected +=
	61	+ part->master->ecc_stats.corrected - stats.corrected;
62	62	return res;
63	63	}
64	64
...	...	@@ -1238,6 +1238,7 @@
1238	1238	mtd->oobavail : mtd->oobsize;
1239	1239
1240	1240	uint8_t bufpoi, oob, *buf;
	1241	+ unsigned int max_bitflips = 0;
1241	1242
1242	1243	stats = mtd->ecc_stats;
1243	1244
...	...	@@ -1265,7 +1266,10 @@
1265	1266
1266	1267	chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
1267	1268
1268		- /* Now read the page into the buffer */
	1269	+ /*
	1270	+ * Now read the page into the buffer. Absent an error,
	1271	+ * the read methods return max bitflips per ecc step.
	1272	+ */
1269	1273	if (unlikely(ops->mode == MTD_OPS_RAW))
1270	1274	ret = chip->ecc.read_page_raw(mtd, chip, bufpoi,
1271	1275	oob_required,
1272	1276
1273	1277
1274	1278
...	...	@@ -1284,15 +1288,19 @@
1284	1288	break;
1285	1289	}
1286	1290
	1291	+ max_bitflips = max_t(unsigned int, max_bitflips, ret);
	1292	+
1287	1293	/* Transfer not aligned data */
1288	1294	if (!aligned) {
1289	1295	if (!NAND_HAS_SUBPAGE_READ(chip) && !oob &&
1290	1296	!(mtd->ecc_stats.failed - stats.failed) &&
1291		- (ops->mode != MTD_OPS_RAW))
	1297	+ (ops->mode != MTD_OPS_RAW)) {
1292	1298	chip->pagebuf = realpage;
1293		- else
	1299	+ chip->pagebuf_bitflips = ret;
	1300	+ } else {
1294	1301	/* Invalidate page cache */
1295	1302	chip->pagebuf = -1;
	1303	+ }
1296	1304	memcpy(buf, chip->buffers->databuf + col, bytes);
1297	1305	}
1298	1306
...	...	@@ -1310,6 +1318,8 @@
1310	1318	} else {
1311	1319	memcpy(buf, chip->buffers->databuf + col, bytes);
1312	1320	buf += bytes;
	1321	+ max_bitflips = max_t(unsigned int, max_bitflips,
	1322	+ chip->pagebuf_bitflips);
1313	1323	}
1314	1324
1315	1325	readlen -= bytes;
...	...	@@ -1341,7 +1351,7 @@
1341	1351	if (mtd->ecc_stats.failed - stats.failed)
1342	1352	return -EBADMSG;
1343	1353
1344		- return mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
	1354	+ return max_bitflips;
1345	1355	}
1346	1356
1347	1357	/**
...	...	@@ -969,7 +969,8 @@
969	969	if (mtd->ecc_stats.failed - stats.failed)
970	970	return -EBADMSG;
971	971
972		- return mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
	972	+ /* return max bitflips per ecc step; ONENANDs correct 1 bit only */
	973	+ return mtd->ecc_stats.corrected != stats.corrected ? 1 : 0;
973	974	}
974	975
975	976	/**
...	...	@@ -464,6 +464,8 @@
464	464	* @pagemask: [INTERN] page number mask = number of (pages / chip) - 1
465	465	* @pagebuf: [INTERN] holds the pagenumber which is currently in
466	466	* data_buf.
	467	+ * @pagebuf_bitflips: [INTERN] holds the bitflip count for the page which is
	468	+ * currently in data_buf.
467	469	* @subpagesize: [INTERN] holds the subpagesize
468	470	* @onfi_version: [INTERN] holds the chip ONFI version (BCD encoded),
469	471	* non 0 if ONFI supported.
...	...	@@ -531,6 +533,7 @@
531	533	uint64_t chipsize;
532	534	int pagemask;
533	535	int pagebuf;
	536	+ unsigned int pagebuf_bitflips;
534	537	int subpagesize;
535	538	uint8_t cellinfo;
536	539	int badblockpos;