21 Sep, 2016

2 commits

  • For a host to access an Open-Channel SSD, it has to know its geometry,
    so that it writes and reads at the appropriate device bounds.

    Currently, the geometry information is kept within the kernel, and not
    exported to user-space for consumption. This patch exposes the
    configuration through sysfs and enables user-space libraries, such as
    liblightnvm, to use the sysfs implementation to get the geometry of an
    Open-Channel SSD.

    The sysfs entries are stored within the device hierarchy, and can be
    found using the "lightnvm" device type.

    An example configuration looks like this:

    /sys/class/nvme/
    └── nvme0n1
    ├── capabilities: 3
    ├── device_mode: 1
    ├── erase_max: 1000000
    ├── erase_typ: 1000000
    ├── flash_media_type: 0
    ├── media_capabilities: 0x00000001
    ├── media_type: 0
    ├── multiplane: 0x00010101
    ├── num_blocks: 1022
    ├── num_channels: 1
    ├── num_luns: 4
    ├── num_pages: 64
    ├── num_planes: 1
    ├── page_size: 4096
    ├── prog_max: 100000
    ├── prog_typ: 100000
    ├── read_max: 10000
    ├── read_typ: 10000
    ├── sector_oob_size: 0
    ├── sector_size: 4096
    ├── media_manager: gennvm
    ├── ppa_format: 0x380830082808001010102008
    ├── vendor_opcode: 0
    ├── max_phys_secs: 64
    └── version: 1

    Signed-off-by: Simon A. F. Lund
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Simon A. F. Lund
     
  • LightNVM compatible device drivers does not have a method to expose
    LightNVM specific sysfs entries.

    To enable LightNVM sysfs entries to be exposed, lightnvm device
    drivers require a struct device to attach it to. To allow both the
    actual device driver and lightnvm sysfs entries to coexist, the device
    driver tracks the lifetime of the nvm_dev structure.

    This patch refactors NVMe and null_blk to handle the lifetime of struct
    nvm_dev, which eliminates the need for struct gendisk when a lightnvm
    compatible device is provided.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     

07 Jul, 2016

6 commits

  • The passed by reference ppa list in nvm_set_rqd_list() is updated when
    multiple planes are available. In that case, each PPA plane is
    incremented when the device side PPA list is created. This prevents the
    caller to rely on the PPA list to be unmodified after a call.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The [get/put]_blk API enables targets to get ownership of blocks at
    runtime. This information is currently not recorded on disk, and the
    information is therefore lost on power failure. To restore the
    metadata, the [get/put]_blk must persist its metadata. In that case,
    we need to control the outer lock, so that we can disable them while
    updating the on-disk metadata. Fortunately, the _unlocked versions can
    be removed, which allows us to move the lock into the [get/put]_blk
    functions.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • To enable persistent block management to easily control creation and
    removal of targets, we move target management into the media
    manager. The LightNVM core continues to maintain which target types are
    registered, while the media manager now keeps track of its initialized
    targets.

    Two new callbacks for the media manager are introduced. create_tgt and
    remove_tgt. Note that remove_tgt returns 0 on successfully removing a
    target, and returns 1 if the target was not found.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The responsibility of the media manager is not to keep track of
    open/closed blocks. This is better maintained within a target,
    that already manages this information on writes.

    Remove the statistics and merge the states NVM_BLK_ST_OPEN and
    NVM_BLK_ST_CLOSED.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The ->reserved bit is not initialized when allocated on stack.
    This may lead targets to misinterpret the PPA as cached.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Expose media manager mark_blk() to targets, as done for the rest of the
    media manager callback functions.

    Signed-off-by: Javier González
    Updated description
    Signed-off-by: Matias Bjørling

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     

07 May, 2016

16 commits

  • The nvm_dev->max_pages_per_blk variable was removed in favor of the new
    nvm->sec_per_blk variable. The ->max_pages_per_blk variable was still
    used in rrpc_capacity, reporting the reserved capacity to zero. Replace
    with ->sec_per_blk to calculate the reserved area again.

    Signed-off-by: Javier González
    Updated patch description. Was "lightnvm: eliminate redundant variable"
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The number of ppas contained on a request is not necessarily the number
    of pages that it maps to neither on the target nor on the device side.
    In order to avoid confusion, rename nr_pages to nr_ppas since it is what
    the variable actually contains.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • A target requires a method to identify PPAs that are either cached in
    memory or on disk. This can efficiently be maintained within the PPA.
    The target host-side translation table can then lookup a PPA and know
    from the PPA if it is cached or on disk. In the case it is cached, it is
    the responsibility of the target to maintain this cache.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Targets can update a block state when having a reference to an
    in-memory virtual block. In the case that a target does not keep the
    block metadata in memory, it does not have a way to update this
    structure.

    Therefore, expose gennvm_mark_blk() through the media managers
    ->mark_blk() callback and let targets update the state structure through
    this callback.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Targets associated with a device manager are not freed on device
    removal. They have to be manually removed before shutdown. Make sure
    any outstanding targets are freed upon shutdown.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Until now, the dma pool have been exclusively used to allocate the ppa
    list being sent to the device. In pblk (upcoming), we use these pools to
    allocate metadata too. Thus, we generalize the names of some variables
    on the dma helper functions to make the code more readable.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Enable metadata buffer to be sent to the device through the metadata
    field on the physical rw nvme command. The size of the metadata buffer
    must follow dev->oob_size * # of PPAs.

    Signed-off-by: Javier González
    Updated description.
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The set_bb_tbl takes struct nvm_rq and only uses its ppa_list and
    nr_pages internally. Instead, make these two variables explicit.
    This allows a user to call it without initializing a struct nvm_rq
    first.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • A virtual block enables a block to identify multiple physical blocks.
    This is useful for metadata where a device media supports multiple
    planes. In that case, a block, with multiple planes can be managed
    as a single vblk. Reducing the metadata required by one forth.

    nvm_set_rqd_ppalist() takes care of expanding a ppa_list with vblks
    automatically. However, for some use-cases, where only a single physical
    block is required, the ppa_list should not be expanded.

    Therefore, add a vblk parameter to nvm_set_rqd_ppalist(), and only
    expand the ppa_list if vblk is set.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The device ops->get_bb_tbl() takes a callback, that allows the caller
    to use its own callback function to update its data structures in the
    returning function.

    This makes it difficult to send parameters to the callback, and usually
    is circumvented by small private structures, that both carry the callers
    state and any flags needed to fulfill the update.

    Refactor ops->get_bb_tbl() to fill a data buffer with the status of the
    blocks returned, and let the user call the callback function manually.
    That will provide the necessary flags and data structures and simplify
    the logic around ops->get_bb_tbl().

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Users that wish to iterate all luns on a device. Must create a
    struct ppa_addr and separate iterators for channels and luns. To set the
    iterators, two loops are required, one to iterate channels, and another
    to iterate luns. This leads to decrease in readability.

    Introduce nvm_for_each_lun_ppa, which implements the nested loop and
    sets ppa, channel, and lun variable for each loop body, eliminating
    the boilerplate code.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • A target name must be unique. However, a per-device registration of
    targets is maintained on a dev->online_targets list, with a per-device
    search for targets upon registration.

    This results in a name collision when two targets, with the same name,
    are created on two different targets, where the per-device list is not
    shared.

    Signed-off-by: Simon A. F. Lund
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Simon A. F. Lund
     
  • The functions nvm_register_target(), nvm_unregister_target() and
    associated list refers to a target type that is being registered by a
    target type module. Rename nvm_*_targets() to nvm_*_tgt_type(), so that
    the intension is clear.

    This enables target instances to use the _nvm_*_targets() naming.

    Signed-off-by: Simon A. F. Lund
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Simon A. F. Lund
     
  • The get block table command returns a list of blocks and planes
    with their associated state. Users, such as gennvm and sysblk,
    manages all planes as a single virtual block.

    It was therefore natural to fold the bad block list before it is
    returned. However, to allow users, which manages on a per-plane
    block level, to also use the interface, the get_bb_tbl interface is
    changed to not fold by default and instead let the caller fold if
    necessary.

    Reviewed by: Johannes Thumshirn
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The flash page size (fpg) and size across planes (pfpg) are convenient
    to know when allocating buffer sizes. This has previously been a
    calculated in various places. Replace with the pre-calculated values.

    Reviewed by: Johannes Thumshirn
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The nvm_submit_ppa function assumes that users manage all plane
    blocks as a single block. Extend the API with nvm_submit_ppa_list
    to allow the user to send its own ppa list. If the user submits more
    than a single PPA, the user must take care to allocate and free
    the corresponding ppa list.

    Reviewed by: Johannes Thumshirn
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     

19 Mar, 2016

4 commits

  • PPAs sent to device is separately acknowledge in a 64bit status
    variable. The status is stored in DW0 and DW1 of the completion queue
    entry. Store this status inside the nvm_rq for further processing.

    This can later be used to implement retry techniques for failed writes
    and reads.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjorling
     
  • Add a bitmap of luns to indicate the status
    of luns: inuse/available. When create targets
    do the necessary check to avoid allocating luns
    that are already allocated.

    Signed-off-by: Wenwei Tao
    Freed dev->lun_map if nvm_core_init later failed in the init process.
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Wenwei Tao
     
  • We can create more than one target on a lightnvm
    device by specifying its begin lun and end lun.

    But only specify the physical address area is not
    enough, we need to get the corresponding non-
    intersection logical address area division from
    the backend device's logcial address space.
    Otherwise the targets on the device might use
    the same logical addresses cause incorrect
    information in the device's l2p table.

    Signed-off-by: Wenwei Tao
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Wenwei Tao
     
  • Pull block driver updates from Jens Axboe:
    "This is the block driver pull request for this merge window. It sits
    on top of for-4.6/core, that was just sent out.

    This contains:

    - A set of fixes for lightnvm. One from Alan, fixing an overflow,
    and the rest from the usual suspects, Javier and Matias.

    - A set of fixes for nbd from Markus and Dan, and a fixup from Arnd
    for correct usage of the signed 64-bit divider.

    - A set of bug fixes for the Micron mtip32xx, from Asai.

    - A fix for the brd discard handling from Bart.

    - Update the maintainers entry for cciss, since that hardware has
    transferred ownership.

    - Three bug fixes for bcache from Eric Wheeler.

    - Set of fixes for xen-blk{back,front} from Jan and Konrad.

    - Removal of the cpqarray driver. It has been disabled in Kconfig
    since 2013, and we were initially scheduled to remove it in 3.15.

    - Various updates and fixes for NVMe, with the most important being:

    - Removal of the per-device NVMe thread, replacing that with a
    watchdog timer instead. From Christoph.

    - Exposing the namespace WWID through sysfs, from Keith.

    - Set of cleanups from Ming Lin.

    - Logging the controller device name instead of the underlying
    PCI device name, from Sagi.

    - And a bunch of fixes and optimizations from the usual suspects
    in this area"

    * 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits)
    NVMe: Expose ns wwid through single sysfs entry
    drivers:block: cpqarray clean up
    brd: Fix discard request processing
    cpqarray: remove it from the kernel
    cciss: update MAINTAINERS
    NVMe: Remove unused sq_head read in completion path
    bcache: fix cache_set_flush() NULL pointer dereference on OOM
    bcache: cleaned up error handling around register_cache()
    bcache: fix race of writeback thread starting before complete initialization
    NVMe: Create discard zero quirk white list
    nbd: use correct div_s64 helper
    mtip32xx: remove unneeded variable in mtip_cmd_timeout()
    lightnvm: generalize rrpc ppa calculations
    lightnvm: remove struct nvm_dev->total_blocks
    lightnvm: rename ->nr_pages to ->nr_sects
    lightnvm: update closed list outside of intr context
    xen/blback: Fit the important information of the thread in 17 characters
    lightnvm: fold get bb tbl when using dual/quad plane mode
    lightnvm: fix up nonsensical configure overrun checking
    xen-blkback: advertise indirect segment support earlier
    ...

    Linus Torvalds
     

04 Mar, 2016

2 commits


05 Feb, 2016

1 commit

  • System block allows the device to initialize with its configured media
    manager. The system blocks is written to disk, and read again when media
    manager is determined. For this to work, the backend must store the
    data. Device drivers, such as null_blk, does not have any backend
    storage. This patch allows the media manager to be initialized without a
    storage backend.

    It also fix incorrect configuration of capabilities in null_blk, as it
    does not support get/set bad block interface.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     

14 Jan, 2016

1 commit


12 Jan, 2016

8 commits

  • Now that a device can be managed using the system blocks, a method to
    reset the device is necessary as well. This patch introduces logic to
    reset the device easily to factory state and exposes it through an
    ioctl.

    The ioctl takes the following flags:

    NVM_FACTORY_ERASE_ONLY_USER
    By default all blocks, except host-reserved blocks are erased upon
    factory reset. Instead of this, only erase host-reserved blocks.
    NVM_FACTORY_RESET_HOST_BLKS
    Mark host-reserved blocks to be erased and set their type to free.
    NVM_FACTORY_RESET_GRWN_BBLKS
    Mark "grown bad blocks" to be erased and set their type to free.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Use system block information to register the appropriate media manager.
    This enables the LightNVM subsystem to instantiate a media manager
    selected by the user, instead of relying on automatic detection by each
    media manager loaded in the kernel.

    A device must now be initialized before it can proceed to initialize its
    media manager. Upon initialization, the configured media manager is
    automatically initialized as well.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • An Open-Channel SSD shall be initialized before use. To initialize, we
    define an on-disk format, that keeps a small set of metadata to bring up
    the media manager on top of the device.

    The initial step is introduced to allow a user to format the disks for a
    given media manager. During format, a system block is stored on one to
    three separate luns on the device. Each lun has the system block
    duplicated. During initialization, the system block can be retrieved and
    the appropriate media manager can initialized.

    The on-disk format currently covers (struct nvm_system_block):

    - Magic value "NVMS".
    - Monotonic increasing sequence number.
    - The physical block erase count.
    - Version of the system block format.
    - Media manager type.
    - Media manager superblock physical address.

    The interface provides three functions to manage the system block:

    int nvm_init_sysblock(struct nvm_dev *, struct nvm_sb_info *)
    int nvm_get_sysblock(struct nvm *dev, struct nvm_sb_info *)
    int nvm_update_sysblock(struct nvm *dev, struct nvm_sb_info *)

    Each implement a part of the logic to manage the system block. The
    initialization creates the first system blocks and mark them on the
    device. Get retrieves the latest system block by scanning all pages in
    the associated system blocks. The update sysblock writes new metadata
    and allocates new block if necessary.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • NAND MLC memories have both lower and upper pages. When programming,
    both of these must be written, before data can be read. However,
    these lower and upper pages might not placed at even and odd flash
    pages, but can be skipped. Therefore each flash memory has its lower
    pages defined, which can then be used when programming and to know when
    padding are necessary.

    This patch implements the lower page definition in the specification,
    and exposes it through a simple lookup table at dev->lptbl.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Some flash media has extended capabilities, such as programming SLC
    pages on MLC/TLC flash, erase/program suspend, scramble and encryption.
    MCCAP is introduced to detect support for these capabilities in the
    command set.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • LightNVM targets need to know the state of the flash block when doing
    flash optimizations. An example is implementing a write buffer to
    respect the flash page size. Currently, block state is not accounted
    for; the media manager only differentiates among free, bad and in-use
    blocks.

    This patch adds the logic in the generic media manager to enable
    targets manage blocks into open and close separately, and it implements
    such management in rrpc. It also adds a set of flags to describe the
    state of the block (open, closed, free, bad).

    In order to avoid taking two locks (nvm_lun and rrpc_lun) consecutively,
    we introduce lockless get_/put_block primitives so that the open and
    close list locks and future common logic is handled within the nvm_lun
    lock.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The get/set bad block interface defines good block, factory bad block,
    grown bad block, device reserved block, and host reserved block.
    Unfortunately the grown bad block was missing, leaving the offsets wrong
    for device and host side reserved blocks.

    This patch adds the missing type and corrects the offsets.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Internal logic for both core and media managers, does not have a
    backing bio for issuing I/Os. Introduce nvm_submit_ppa to allow raw
    I/Os to be submitted to the underlying device driver.

    The function request the device, ppa, data buffer and its length and
    will submit the I/O synchronously to the device. The return value may
    therefore be used to detect any errors regarding the issued I/O.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling