Eric Lee / smarc-fsl-linux-kernel

07 May, 2019

1 commit

75c89bef6 lightnvm: pblk: ensure that erase is chunk aligned ... Browse Code »

The sector bits in the erase command may be uninitialized are
uninitialized, causing the erase LBA to be unaligned to the chunk size.

This is unexpected situation, since erase shall always be chunk
aligned based on OCSSD the 2.0 specification.

Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Igor Konopko
2019-05-07 00:19:17 +0800

11 Feb, 2019

1 commit

0586942f0 lightnvm: pblk: fix race condition on GC ... Browse Code »

This patch fixes a race condition where a write is mapped to the last
sectors of a line. The write is synced to the device but the L2P is not
updated yet. When the line is garbage collected before the L2P update
is performed, the sectors are ignored by the GC logic and the line is
freed before all sectors are moved. When the L2P is finally updated, it
contains a mapping to a freed line, subsequent reads of the
corresponding LBAs fail.

This patch introduces a per line counter specifying the number of
sectors that are synced to the device but have not been updated in the
L2P. Lines with a counter of greater than zero will not be selected
for GC.

Signed-off-by: Heiner Litz
Reviewed-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Heiner Litz
2019-02-11 23:18:08 +0800

12 Dec, 2018

3 commits

55d8ec353 lightnvm: pblk: support packed metadata ... Browse Code »

pblk performs recovery of open lines by storing the LBA in the per LBA
metadata field. Recovery therefore only works for drives that has this
field.

This patch adds support for packed metadata, which store l2p mapping
for open lines in last sector of every write unit and enables drives
without per IO metadata to recover open lines.

After this patch, drives with OOB size
Signed-off-by: Igor Konopko
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Igor Konopko
2018-12-12 03:22:35 +0800
faa79f27f lightnvm: pblk: add helpers for OOB metadata ... Browse Code »

pblk currently assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive in the future.

After this patch only OOB metadata equal to 16 bytes is supported.

Reviewed-by: Javier González
Signed-off-by: Igor Konopko
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Igor Konopko
2018-12-12 03:22:35 +0800
525f7bb2c lightnvm: pblk: stop writes gracefully when running out of lines ... Browse Code »

If mapping fails (i.e. when running out of lines), handle the error
and stop writing.

Signed-off-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-12-12 03:22:33 +0800

09 Oct, 2018

3 commits

02a1520d5 lightnvm: pblk: add SPDX license tag ... Browse Code »

Add GLP-2.0 SPDX license tag to all pblk files

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:08 +0800
43241cfe4 lightnvm: pblk: remove debug from pblk_[down/up]_page ... Browse Code »

Remove the debug only iteration within __pblk_down_page, which
then allows us to reduce the number of arguments down to pblk and
the parallel unit from the functions that calls it. Simplifying the
callers logic considerably.

Also, rename the functions pblk_[down/up]_page to
pblk_[down/up]_chunk, to communicate that it manages the write
pointer of the chunk. Note that it also protects the parallel unit
such that at most one chunk is active per parallel unit.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:07 +0800
d68a93440 lightnvm: introduce nvm_rq_to_ppa_list ... Browse Code »

There is a number of places in the lightnvm subsystem where the user
iterates over the ppa list. Before iterating, the user must know if it
is a single or multiple LBAs due to vector commands using either the
nvm_rq ->ppa_addr or ->ppa_list fields on command submission, which
leads to open-coding the if/else statement.

Instead of having multiple if/else's, move it into a function that can
be called by its users.

A nice side effect of this cleanup is that this patch fixes up a
bunch of cases where we don't consider the single-ppa case in pblk.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800

01 Jun, 2018

1 commit

2deeefc02 lightnvm: pblk: fail gracefully on line alloc. failure ... Browse Code »

In the event of a line failing to allocate, fail gracefully and stop the
pipeline to avoid more write failing in the same place.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-06-01 21:43:53 +0800

30 Mar, 2018

2 commits

694715137 lightnvm: add support for 2.0 address format ... Browse Code »

Add support for 2.0 address format. Also, align address bits for 1.2 and
2.0 to be able to operate on channel and luns without requiring a format
conversion. Use a generic address format for this purpose.

Also, convert the generic operations to the generic format in pblk.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-03-30 07:29:09 +0800
76758390f lightnvm: pblk: export write amplification counters to sysfs ... Browse Code »

In a SSD, write amplification, WA, is defined as the average
number of page writes per user page write. Write amplification
negatively affects write performance and decreases the lifetime
of the disk, so it's a useful metric to add to sysfs.

In plkb's case, the number of writes per user sector is the sum of:

(1) number of user writes
(2) number of sectors written by the garbage collector
(3) number of sectors padded (i.e. due to syncs)

This patch adds persistent counters for 1-3 and two sysfs attributes
to export these along with WA calculated with five decimals:

write_amp_mileage: the accumulated write amplification stats
for the lifetime of the pblk instance

write_amp_trip: resetable stats to facilitate delta measurements,
values reset at creation and if 0 is written
to the attribute.

64-bit counters are used as a 32 bit counter would wrap around
already after about 17 TB worth of user data. It will take a
long long time before the 64 bit sector counters wrap around.

The counters are stored after the bad block bitmap in the first
emeta sector of each written line. There is plenty of space in the
first emeta sector, so we don't need to bump the major version of
the line data format.

Signed-off-by: Hans Holmberg
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-03-30 07:29:09 +0800

05 Jan, 2018

1 commit

26f76dce6 lightnvm: use internal pblk methods ... Browse Code »

Now that rrpc has been removed, the only users of the ppa helpers
is pblk. However, pblk already defines similar functions.

Switch pblk to use the internal ones, and remove the generic ppa
helpers.

Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Matias Bjørling
2018-01-05 23:50:12 +0800

13 Oct, 2017

2 commits

03e868eb8 lightnvm: pblk: correct valid lba count calculation ... Browse Code »

During garbage collect, lbas being written can end up
being invalidated. Make sure that this is reflected in
the valid lba count.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2017-10-13 22:34:57 +0800
21d228711 lightnvm: pblk: enable 1 LUN configuration ... Browse Code »

Metadata I/Os are scheduled to minimize their impact on user data I/Os.
When there are enough LUNs instantiated (i.e., enough bandwidth), it is
easy to interleave metadata and data one after the other so that
metadata I/Os are the ones being blocked and not vice-versa.

We do this by calculating the distance between the I/Os in terms of the
LUNs that are not in used, and selecting a free LUN that satisfies a
the simple heuristic that metadata is scheduled behind. The per-LUN
semaphores guarantee consistency. This works fine on >1 LUN
configuration. However, when a single LUN is instantiated, this design
leads to a deadlock, where metadata waits to be scheduled on a free LUN.

This patch implements the 1 LUN case by simply scheduling the metadada
I/O after the data I/O. In the process, we refactor the way a line is
replaced to ensure that metadata writes are submitted after data writes
in order to guarantee block sequentiality. Note that, since there is
only one LUN, both I/Os will block each other by design. However, such
configuration only pursues tight read latencies, not write bandwidth.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-10-13 22:34:57 +0800

01 Jul, 2017

1 commit

f417aa0bd lightnvm: pblk: fix bad le64 assignations ... Browse Code »

Use the right types and conversions on le64 variables. Reported by
sparse.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-07-01 01:08:18 +0800

27 Jun, 2017

5 commits

588726d3e lightnvm: pblk: fail gracefully on irrec. error ... Browse Code »

Due to user writes being decoupled from media writes because of the need
of an intermediate write buffer, irrecoverable media write errors lead
to pblk stalling; user writes fill up the buffer and end up in an
infinite retry loop.

In order to let user writes fail gracefully, it is necessary for pblk to
keep track of its own internal state and prevent further writes from
being placed into the write buffer.

This patch implements a state machine to keep track of internal errors
and, in case of failure, fail further user writes in an standard way.
Depending on the type of error, pblk will do its best to persist
buffered writes (which are already acknowledged) and close down on a
graceful manner. This way, data might be recovered by re-instantiating
pblk. Such state machine paves out the way for a state-based FTL log.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:27:39 +0800
0880a9aa2 lightnvm: pblk: delete redundant buffer pointer ... Browse Code »

After refactoring the metadata path, the backpointer controlling
synced I/Os in a line becomes unnecessary; metadata is scheduled
on the write thread, thus we know when the end of the line is reached
and act on it directly.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:27:39 +0800
dd2a43437 lightnvm: pblk: sched. metadata on write thread ... Browse Code »

At the moment, line metadata is persisted on a separate work queue, that
is kicked each time that a line is closed. The assumption when designing
this was that freeing the write thread from creating a new write request
was better than the potential impact of writes colliding on the media
(user I/O and metadata I/O). Experimentation has proven that this
assumption is wrong; collision can cause up to 25% of bandwidth and
introduce long tail latencies on the write thread, which potentially
cause user write threads to spend more time spinning to get a free entry
on the write buffer.

This patch moves the metadata logic to the write thread. When a line is
closed, remaining metadata is written in memory and is placed on a
metadata queue. The write thread then takes the metadata corresponding
to the previous line, creates the write request and schedules it to
minimize collisions on the media. Using this approach, we see that we
can saturate the media's bandwidth, which helps reducing both write
latencies and the spinning time for user writer threads.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:27:39 +0800
d624f371d lightnvm: pblk: generalize erase path ... Browse Code »

Erase I/Os are scheduled with the following goals in mind: (i) minimize
LUNs collisions with write I/Os, and (ii) even out the price of erasing
on every write, instead of putting all the burden on when garbage
collection runs. This works well on the current design, but is specific
to the default mapping algorithm.

This patch generalizes the erase path so that other mapping algorithms
can select an arbitrary line to be erased instead. It also gets rid of
the erase semaphore since it creates jittering for user writes.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:24:53 +0800
caa69fa56 lightnvm: pblk: spare double cpu_to_le64 calc. ... Browse Code »

Spare a double calculation on the fast write path.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:24:53 +0800

24 Apr, 2017

1 commit

a44f53faf lightnvm: pblk: fix erase counters on error fail ... Browse Code »

When block erases fail, these blocks are marked bad. The number of valid
blocks in the line was not updated, which could cause an infinite loop
on the erase path.

Fix this atomic counter and, in order to avoid taking an irq lock on the
interrupt context, make the erase counters atomic too.

Also, in the case that a significant number of blocks become bad in a
line, the result is the double shared metadata buffer (emeta) to stop
the pipeline until all metadata is flushed to the media. Increase the
number of metadata lines from 2 to 4 to avoid this case.

Fixes: a4bd217b4326 "lightnvm: physical block device (pblk) target"

Signed-off-by: Javier González
Reviewed-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-04-24 06:57:52 +0800

17 Apr, 2017

1 commit

a4bd217b4 lightnvm: physical block device (pblk) target ... Browse Code »

This patch introduces pblk, a host-side translation layer for
Open-Channel SSDs to expose them like block devices. The translation
layer allows data placement decisions, and I/O scheduling to be
managed by the host, enabling users to optimize the SSD for their
specific workloads.

An open-channel SSD has a set of LUNs (parallel units) and a
collection of blocks. Each block can be read in any order, but
writes must be sequential. Writes may also fail, and if a block
requires it, must also be reset before new writes can be
applied.

To manage the constraints, pblk maintains a logical to
physical address (L2P) table, write cache, garbage
collection logic, recovery scheme, and logic to rate-limit
user I/Os versus garbage collection I/Os.

The L2P table is fully-associative and manages sectors at a
4KB granularity. Pblk stores the L2P table in two places, in
the out-of-band area of the media and on the last page of a
line. In the cause of a power failure, pblk will perform a
scan to recover the L2P table.

The user data is organized into lines. A line is data
striped across blocks and LUNs. The lines enable the host to
reduce the amount of metadata to maintain besides the user
data and makes it easier to implement RAID or erasure coding
in the future.

pblk implements multi-tenant support and can be instantiated
multiple times on the same drive. Each instance owns a
portion of the SSD - both regarding I/O bandwidth and
capacity - providing I/O isolation for each case.

Finally, pblk also exposes a sysfs interface that allows
user-space to peek into the internals of pblk. The interface
is available at /dev/block/*/pblk/ where * is the block
device name exposed.

This work also contains contributions from:
Matias Bjørling
Simon A. F. Lund
Young Tack Jin
Huaicheng Li

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-04-17 00:06:33 +0800