Eric Lee / smarc-fsl-linux-kernel

30 Mar, 2018

7 commits

3b2a3ad11 lightnvm: pblk: implement 2.0 support ... Browse Code »

Implement 2.0 support in pblk. This includes the address formatting and
mapping paths, as well as the sysfs entries for them.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-03-30 07:29:09 +0800
bb845ae45 lightnvm: pblk: rename ppaf* to addrf* ... Browse Code »

In preparation for 2.0 support in pblk, rename variables referring to
the address format to addrf and reserve ppaf for the 1.2 path.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-03-30 07:29:09 +0800
694715137 lightnvm: add support for 2.0 address format ... Browse Code »

Add support for 2.0 address format. Also, align address bits for 1.2 and
2.0 to be able to operate on channel and luns without requiring a format
conversion. Use a generic address format for this purpose.

Also, convert the generic operations to the generic format in pblk.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-03-30 07:29:09 +0800
a40afad90 lightnvm: normalize geometry nomenclature ... Browse Code »

Normalize nomenclature for naming channels, luns, chunks, planes and
sectors as well as derivations in order to improve readability.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-03-30 07:29:09 +0800
e46f4e482 lightnvm: simplify geometry structure ... Browse Code »

Currently, the device geometry is stored redundantly in the nvm_id and
nvm_geo structures at a device level. Moreover, when instantiating
targets on a specific number of LUNs, these structures are replicated
and manually modified to fit the instance channel and LUN partitioning.

Instead, create a generic geometry around nvm_geo, which can be used by
(i) the underlying device to describe the geometry of the whole device,
and (ii) instances to describe their geometry independently.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-03-30 07:29:09 +0800
5d149bfab lightnvm: pblk: add padding distribution sysfs attribute ... Browse Code »

When pblk receives a sync, all data up to that point in the write buffer
must be comitted to persistent storage, and as flash memory comes with a
minimal write size there is a significant cost involved both in terms
of time for completing the sync and in terms of write amplification
padded sectors for filling up to the minimal write size.

In order to get a better understanding of the costs involved for syncs,
Add a sysfs attribute to pblk: padded_dist, showing a normalized
distribution of sectors padded. In order to facilitate measurements of
specific workloads during the lifetime of the pblk instance, the
distribution can be reset by writing 0 to the attribute.

Do this by introducing counters for each possible padding:
{0..(minimal write size - 1)} and calculate the normalized distribution
when showing the attribute.

Signed-off-by: Hans Holmberg
Signed-off-by: Javier González
Rearranged total_buckets statement in pblk_sysfs_get_padding_dist
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-03-30 07:29:09 +0800
76758390f lightnvm: pblk: export write amplification counters to sysfs ... Browse Code »

In a SSD, write amplification, WA, is defined as the average
number of page writes per user page write. Write amplification
negatively affects write performance and decreases the lifetime
of the disk, so it's a useful metric to add to sysfs.

In plkb's case, the number of writes per user sector is the sum of:

(1) number of user writes
(2) number of sectors written by the garbage collector
(3) number of sectors padded (i.e. due to syncs)

This patch adds persistent counters for 1-3 and two sysfs attributes
to export these along with WA calculated with five decimals:

write_amp_mileage: the accumulated write amplification stats
for the lifetime of the pblk instance

write_amp_trip: resetable stats to facilitate delta measurements,
values reset at creation and if 0 is written
to the attribute.

64-bit counters are used as a 32 bit counter would wrap around
already after about 17 TB worth of user data. It will take a
long long time before the 64 bit sector counters wrap around.

The counters are stored after the bad block bitmap in the first
emeta sector of each written line. There is plenty of space in the
first emeta sector, so we don't need to bump the major version of
the line data format.

Signed-off-by: Hans Holmberg
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-03-30 07:29:09 +0800

05 Jan, 2018

2 commits

a7689938e lightnvm: pblk: use exact free block counter in RL ... Browse Code »

Until now, pblk's rate-limiter has used a heuristic to reserve space for
GC I/O given that the over-provision area was fixed.

In preparation for allowing to define the over-provision area on target
creation, define a dedicated free_block counter in the rate-limiter to
track the number of blocks being used for user data.

Signed-off-by: Javier González
Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-01-05 23:50:12 +0800
fae7fae40 lightnvm: make geometry structures 2.0 ready ... Browse Code »

Prepare for the 2.0 revision by adapting the geometry
structures to coexist with the 1.2 revision.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Matias Bjørling
2018-01-05 23:50:12 +0800

13 Oct, 2017

1 commit

d6b992f7a lightnvm: pblk: gc all lines in the pipeline before exit ... Browse Code »

Finish garbage collect of the lines that are in the gc pipeline
before exiting. Ensure that all lines already in in the pipeline
goes through, from read to write.

Do this by keeping track of how many lines are in the pipeline
and waiting for that number to reach zero before exiting the gc
reader task.

Since we're adding a new gc line counter, change the name of
inflight_gc to read_inflight_gc to make the distinction clear.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2017-10-13 22:34:57 +0800

01 Jul, 2017

1 commit

653cbb847 lightnvm: pblk: remove unused return variable ... Browse Code »

Remove unused variable.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-07-01 01:08:18 +0800

27 Jun, 2017

7 commits

588726d3e lightnvm: pblk: fail gracefully on irrec. error ... Browse Code »

Due to user writes being decoupled from media writes because of the need
of an intermediate write buffer, irrecoverable media write errors lead
to pblk stalling; user writes fill up the buffer and end up in an
infinite retry loop.

In order to let user writes fail gracefully, it is necessary for pblk to
keep track of its own internal state and prevent further writes from
being placed into the write buffer.

This patch implements a state machine to keep track of internal errors
and, in case of failure, fail further user writes in an standard way.
Depending on the type of error, pblk will do its best to persist
buffered writes (which are already acknowledged) and close down on a
graceful manner. This way, data might be recovered by re-instantiating
pblk. Such state machine paves out the way for a state-based FTL log.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:27:39 +0800
b20ba1bc7 lightnvm: pblk: redesign GC algorithm ... Browse Code »

At the moment, in order to get enough read parallelism, we have recycled
several lines at the same time. This approach has proven not to work
well when reaching capacity, since we end up mixing valid data from all
lines, thus not maintaining a sustainable free/recycled line ratio.

The new design, relies on a two level workqueue mechanism. In the first
level, we read the metadata for a number of lines based on the GC list
they reside on (this is governed by the number of valid sectors in each
line). In the second level, we recycle a single line at a time. Here, we
issue reads in parallel, while a single GC write thread places data in
the write buffer. This design allows to (i) only move data from one line
at a time, thus maintaining a sane free/recycled ration and (ii)
maintain the GC writer busy with recycled data.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:27:39 +0800
0880a9aa2 lightnvm: pblk: delete redundant buffer pointer ... Browse Code »

After refactoring the metadata path, the backpointer controlling
synced I/Os in a line becomes unnecessary; metadata is scheduled
on the write thread, thus we know when the end of the line is reached
and act on it directly.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:27:39 +0800
fd1b0158f lightnvm: pblk: delete redundant debug line stat ... Browse Code »

Remove a legacy variable that helped verifying the consistency of the
run-time metadata for the free line list. With the new metadata layout,
this check is no longer necessary.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:27:39 +0800
dd2a43437 lightnvm: pblk: sched. metadata on write thread ... Browse Code »

At the moment, line metadata is persisted on a separate work queue, that
is kicked each time that a line is closed. The assumption when designing
this was that freeing the write thread from creating a new write request
was better than the potential impact of writes colliding on the media
(user I/O and metadata I/O). Experimentation has proven that this
assumption is wrong; collision can cause up to 25% of bandwidth and
introduce long tail latencies on the write thread, which potentially
cause user write threads to spend more time spinning to get a free entry
on the write buffer.

This patch moves the metadata logic to the write thread. When a line is
closed, remaining metadata is written in memory and is placed on a
metadata queue. The write thread then takes the metadata corresponding
to the previous line, creates the write request and schedules it to
minimize collisions on the media. Using this approach, we see that we
can saturate the media's bandwidth, which helps reducing both write
latencies and the spinning time for user writer threads.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:27:39 +0800
c2e9f5d45 lightnvm: pblk: expose max sec per write on sysfs ... Browse Code »

Allow to configure the number of maximum sectors per write command
through sysfs. This makes it easier to tune write command sizes for
different controller configurations.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:24:53 +0800
db7ada33c lightnvm: pblk: add debug stat for read cache hits ... Browse Code »

Add a new debug counter to measure cache hits on the read path

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-06-27 06:24:53 +0800

17 Apr, 2017

1 commit

a4bd217b4 lightnvm: physical block device (pblk) target ... Browse Code »

This patch introduces pblk, a host-side translation layer for
Open-Channel SSDs to expose them like block devices. The translation
layer allows data placement decisions, and I/O scheduling to be
managed by the host, enabling users to optimize the SSD for their
specific workloads.

An open-channel SSD has a set of LUNs (parallel units) and a
collection of blocks. Each block can be read in any order, but
writes must be sequential. Writes may also fail, and if a block
requires it, must also be reset before new writes can be
applied.

To manage the constraints, pblk maintains a logical to
physical address (L2P) table, write cache, garbage
collection logic, recovery scheme, and logic to rate-limit
user I/Os versus garbage collection I/Os.

The L2P table is fully-associative and manages sectors at a
4KB granularity. Pblk stores the L2P table in two places, in
the out-of-band area of the media and on the last page of a
line. In the cause of a power failure, pblk will perform a
scan to recover the L2P table.

The user data is organized into lines. A line is data
striped across blocks and LUNs. The lines enable the host to
reduce the amount of metadata to maintain besides the user
data and makes it easier to implement RAID or erasure coding
in the future.

pblk implements multi-tenant support and can be instantiated
multiple times on the same drive. Each instance owns a
portion of the SSD - both regarding I/O bandwidth and
capacity - providing I/O isolation for each case.

Finally, pblk also exposes a sysfs interface that allows
user-space to peek into the internals of pblk. The interface
is available at /dev/block/*/pblk/ where * is the block
device name exposed.

This work also contains contributions from:
Matias Bjørling
Simon A. F. Lund
Young Tack Jin
Huaicheng Li

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2017-04-17 00:06:33 +0800