08 Oct, 2016

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the main pull request for block layer changes in 4.9.

    As mentioned at the last merge window, I've changed things up and now
    do just one branch for core block layer changes, and driver changes.
    This avoids dependencies between the two branches. Outside of this
    main pull request, there are two topical branches coming as well.

    This pull request contains:

    - A set of fixes, and a conversion to blk-mq, of nbd. From Josef.

    - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd.
    Followup dependency fix from Geert.

    - General fixes from Bart, Baoyou, Guoqing, and Linus W.

    - CFQ async write starvation fix from Glauber.

    - Add supprot for delayed kick of the requeue list, from Mike.

    - Pull out the scalable bitmap code from blk-mq-tag.c and make it
    generally available under the name of sbitmap. Only blk-mq-tag uses
    it for now, but the blk-mq scheduling bits will use it as well.
    From Omar.

    - bdev thaw error progagation from Pierre.

    - Improve the blk polling statistics, and allow the user to clear
    them. From Stephen.

    - Set of minor cleanups from Christoph in block/blk-mq.

    - Set of cleanups and optimizations from me for block/blk-mq.

    - Various nvme/nvmet/nvmeof fixes from the various folks"

    * 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits)
    fs/block_dev.c: return the right error in thaw_bdev()
    nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
    nvme/scsi: Remove power management support
    nvmet: Make dsm number of ranges zero based
    nvmet: Use direct IO for writes
    admin-cmd: Added smart-log command support.
    nvme-fabrics: Add host_traddr options field to host infrastructure
    nvme-fabrics: revise host transport option descriptions
    nvme-fabrics: rework nvmf_get_address() for variable options
    nbd: use BLK_MQ_F_BLOCKING
    blkcg: Annotate blkg_hint correctly
    cfq: fix starvation of asynchronous writes
    blk-mq: add flag for drivers wanting blocking ->queue_rq()
    blk-mq: remove non-blocking pass in blk_mq_map_request
    blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
    block: export bio_free_pages to other modules
    lightnvm: propagate device_add() error code
    lightnvm: expose device geometry through sysfs
    lightnvm: control life of nvm_dev in driver
    blk-mq: register device instead of disk
    ...

    Linus Torvalds
     

21 Sep, 2016

5 commits

  • device_add() may fail, and all callers are supposed to check the
    return value, but one new user in lightnvm doesn't:

    drivers/lightnvm/sysfs.c: In function 'nvm_sysfs_register_dev':
    drivers/lightnvm/sysfs.c:184:2: error: ignoring return value of 'device_add',
    declared with attribute warn_unused_result [-Werror=unused-result]

    This changes the caller to propagate any error codes, which avoids
    the warning.

    Signed-off-by: Arnd Bergmann
    Fixes: 38c9e260b9f9 ("lightnvm: expose device geometry through sysfs")
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • For a host to access an Open-Channel SSD, it has to know its geometry,
    so that it writes and reads at the appropriate device bounds.

    Currently, the geometry information is kept within the kernel, and not
    exported to user-space for consumption. This patch exposes the
    configuration through sysfs and enables user-space libraries, such as
    liblightnvm, to use the sysfs implementation to get the geometry of an
    Open-Channel SSD.

    The sysfs entries are stored within the device hierarchy, and can be
    found using the "lightnvm" device type.

    An example configuration looks like this:

    /sys/class/nvme/
    └── nvme0n1
    ├── capabilities: 3
    ├── device_mode: 1
    ├── erase_max: 1000000
    ├── erase_typ: 1000000
    ├── flash_media_type: 0
    ├── media_capabilities: 0x00000001
    ├── media_type: 0
    ├── multiplane: 0x00010101
    ├── num_blocks: 1022
    ├── num_channels: 1
    ├── num_luns: 4
    ├── num_pages: 64
    ├── num_planes: 1
    ├── page_size: 4096
    ├── prog_max: 100000
    ├── prog_typ: 100000
    ├── read_max: 10000
    ├── read_typ: 10000
    ├── sector_oob_size: 0
    ├── sector_size: 4096
    ├── media_manager: gennvm
    ├── ppa_format: 0x380830082808001010102008
    ├── vendor_opcode: 0
    ├── max_phys_secs: 64
    └── version: 1

    Signed-off-by: Simon A. F. Lund
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Simon A. F. Lund
     
  • LightNVM compatible device drivers does not have a method to expose
    LightNVM specific sysfs entries.

    To enable LightNVM sysfs entries to be exposed, lightnvm device
    drivers require a struct device to attach it to. To allow both the
    actual device driver and lightnvm sysfs entries to coexist, the device
    driver tracks the lifetime of the nvm_dev structure.

    This patch refactors NVMe and null_blk to handle the lifetime of struct
    nvm_dev, which eliminates the need for struct gendisk when a lightnvm
    compatible device is provided.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • With LightNVM enabled namespaces, the gendisk structure is not exposed
    to the user. This prevents LightNVM users from accessing the NVMe device
    driver specific sysfs entries, and LightNVM namespace geometry.

    Refactor the revalidation process, so that a namespace, instead of a
    gendisk, is revalidated. This later allows patches to wire up the
    sysfs entries up to a non-gendisk namespace.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • If NO_DMA=y:

    drivers/built-in.o: In function `nvme_nvm_dev_dma_free':
    lightnvm.c:(.text+0x23df1a): undefined reference to `dma_pool_free'
    drivers/built-in.o: In function `nvme_nvm_dev_dma_alloc':
    lightnvm.c:(.text+0x23df38): undefined reference to `dma_pool_alloc'
    drivers/built-in.o: In function `nvme_nvm_destroy_dma_pool':
    lightnvm.c:(.text+0x23df4c): undefined reference to `dma_pool_destroy'
    drivers/built-in.o: In function `nvme_nvm_create_dma_pool':
    lightnvm.c:(.text+0x23df7e): undefined reference to `dma_pool_create'

    and

    ERROR: "dma_pool_destroy" [drivers/nvme/host/nvme-core.ko] undefined!
    ERROR: "dma_pool_free" [drivers/nvme/host/nvme-core.ko] undefined!
    ERROR: "dma_pool_alloc" [drivers/nvme/host/nvme-core.ko] undefined!
    ERROR: "dma_pool_create" [drivers/nvme/host/nvme-core.ko] undefined!

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Geert Uytterhoeven
     

31 Aug, 2016

1 commit

  • This patch removes module_init()/module_exit() from driver code by using
    module_misc_device() macro. All modules in this patch has a print
    statement which is removed when module_misc_device() macro is used.
    If undesirable this patch can be dropped entirely, this is the only
    purpose of making this as a separate patch.

    Signed-off-by: PrasannaKumar Muralidharan
    Signed-off-by: Greg Kroah-Hartman

    PrasannaKumar Muralidharan
     

21 Jul, 2016

1 commit

  • These two are confusing leftover of the old world order, combining
    values of the REQ_OP_ and REQ_ namespaces. For callers that don't
    special case we mostly just replace bi_rw with bio_data_dir or
    op_is_write, except for the few cases where a switch over the REQ_OP_
    values makes more sense. Any check for READA is replaced with an
    explicit check for REQ_RAHEAD. Also remove the READA alias for
    REQ_RAHEAD.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

07 Jul, 2016

15 commits

  • The __nvm_submit_ppa() function is not used outside lightnvm core.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The passed by reference ppa list in nvm_set_rqd_list() is updated when
    multiple planes are available. In that case, each PPA plane is
    incremented when the device side PPA list is created. This prevents the
    caller to rely on the PPA list to be unmodified after a call.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The gen_mark_blk_bad function marks the wrong block when a block is on
    a different channel. Fix the index calculation, so that it updates the
    correct block.

    Reported-by: Javier Gonzalez
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The nvm_get_blk() function is called with rlun->lock held. This is ok
    when the media manager implementation doesn't go out of its atomic
    context. However, if a media manager persists its metadata, and
    guarantees that the block is given to the target, this is no longer
    a viable approach. Therefore, clean up the flow of rrpc_map_page,
    and make sure that nvm_get_blk() is called without any locks acquired.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The [get/put]_blk API enables targets to get ownership of blocks at
    runtime. This information is currently not recorded on disk, and the
    information is therefore lost on power failure. To restore the
    metadata, the [get/put]_blk must persist its metadata. In that case,
    we need to control the outer lock, so that we can disable them while
    updating the on-disk metadata. Fortunately, the _unlocked versions can
    be removed, which allows us to move the lock into the [get/put]_blk
    functions.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The ->list, ->open_list, and ->closed_list lists were previously used
    for statistics. However, their usage have been removed, and thus these
    can safely be removed.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • If a media manager tries to initialize it targets upon media manager
    initialization, the media manager will need to know which target types
    are available in LightNVM. The lists of which managers and target types
    are available shares the same lock.

    Therefore, on initialization, the nvm_lock is taken by LightNVM core,
    which later leads to a deadlock when target types are enumerated by the
    media manager.

    Add an exclusive lock for target types to resolve this conflict.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • To enable persistent block management to easily control creation and
    removal of targets, we move target management into the media
    manager. The LightNVM core continues to maintain which target types are
    registered, while the media manager now keeps track of its initialized
    targets.

    Two new callbacks for the media manager are introduced. create_tgt and
    remove_tgt. Note that remove_tgt returns 0 on successfully removing a
    target, and returns 1 if the target was not found.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The generic manager should be called the general media manager, and
    instead of using the rather long name of "gennvm" in front of each data
    structures, use "gen" instead to shorten it. Update the description of
    the media manager as well to make the media manager purpose clearer.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The responsibility of the media manager is not to keep track of
    open/closed blocks. This is better maintained within a target,
    that already manages this information on writes.

    Remove the statistics and merge the states NVM_BLK_ST_OPEN and
    NVM_BLK_ST_CLOSED.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • A couple of small checkpatch fixups to stop it from complaining.

    ./drivers/lightnvm/core.c:360: WARNING: line over 80 characters
    ./drivers/lightnvm/core.c:360: ERROR: trailing statements should be on
    next line
    ./drivers/lightnvm/core.c:503: WARNING: Block comments use a trailing */
    on a separate line

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Checkpatch found two incidents where the type was preferred to be
    written out in full.

    ./drivers/lightnvm/rrpc.h:184: WARNING: Prefer 'unsigned int' to bare
    use of 'unsigned'
    ./drivers/lightnvm/rrpc.h:209: WARNING: Prefer 'unsigned int' to bare
    use of 'unsigned'
    ./drivers/lightnvm/rrpc.c:51: WARNING: Prefer 'unsigned int' to bare use
    of 'unsigned'

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Mark functions not used by ouside of thier implementing file as static.

    Signed-off-by: Johannes Thumshirn
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Johannes Thumshirn
     
  • Expose media manager mark_blk() to targets, as done for the rest of the
    media manager callback functions.

    Signed-off-by: Javier González
    Updated description
    Signed-off-by: Matias Bjørling

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Break the loop when rqd is not null to reduce
    an unnecessary schedule.

    Signed-off-by: Wenwei Tao
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Wenwei Tao
     

08 Jun, 2016

1 commit

  • This patch converts the simple bi_rw use cases in the block,
    drivers, mm and fs code to set/get the bio operation using
    bio_set_op_attrs/bio_op

    These should be simple one or two liner cases, so I just did them
    in one patch. The next patches handle the more complicated
    cases in a module per patch.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

07 May, 2016

16 commits

  • The nvm_dev->max_pages_per_blk variable was removed in favor of the new
    nvm->sec_per_blk variable. The ->max_pages_per_blk variable was still
    used in rrpc_capacity, reporting the reserved capacity to zero. Replace
    with ->sec_per_blk to calculate the reserved area again.

    Signed-off-by: Javier González
    Updated patch description. Was "lightnvm: eliminate redundant variable"
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The number of ppas contained on a request is not necessarily the number
    of pages that it maps to neither on the target nor on the device side.
    In order to avoid confusion, rename nr_pages to nr_ppas since it is what
    the variable actually contains.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Targets can update a block state when having a reference to an
    in-memory virtual block. In the case that a target does not keep the
    block metadata in memory, it does not have a way to update this
    structure.

    Therefore, expose gennvm_mark_blk() through the media managers
    ->mark_blk() callback and let targets update the state structure through
    this callback.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Targets associated with a device manager are not freed on device
    removal. They have to be manually removed before shutdown. Make sure
    any outstanding targets are freed upon shutdown.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • When doing GC, rrpc calculates the physical LUN to which the rrpc block
    belongs too. This calculation is based on the assumption that LUNs are
    assigned sequentially to the LUN list. Use the reference to the LUN
    instead. This saves us the calculation and allows us to align LUNs in a
    different manner to, for example, take advantage of devide parallelism.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Until now, the dma pool have been exclusively used to allocate the ppa
    list being sent to the device. In pblk (upcoming), we use these pools to
    allocate metadata too. Thus, we generalize the names of some variables
    on the dma helper functions to make the code more readable.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • rrpc does not save any metadata on a given request. Thus, do not attempt
    to free the metadata dma region.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The ppa configured for retrieving the bad block table uses the internal
    lun id to setup the get bad block ppa. This increases monotonically
    with the number luns available. When configuring a ppa, the channel and
    lun must be specified separately, leading to an out of bound memory
    access in gennvm_block_bb when lun id goes beyond the luns available
    within a channel.

    Additional, remove out of bound check in gennvm_block_bb(), as it was a
    buggy to begin with.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The set_bb_tbl takes struct nvm_rq and only uses its ppa_list and
    nr_pages internally. Instead, make these two variables explicit.
    This allows a user to call it without initializing a struct nvm_rq
    first.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • We move the responsibility of managing the persistent bad block table to
    the target. The target may choose to mark a block bad or retry writing
    to it. Never the less, it should be the target that makes the decision
    and not the media manager.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • A virtual block enables a block to identify multiple physical blocks.
    This is useful for metadata where a device media supports multiple
    planes. In that case, a block, with multiple planes can be managed
    as a single vblk. Reducing the metadata required by one forth.

    nvm_set_rqd_ppalist() takes care of expanding a ppa_list with vblks
    automatically. However, for some use-cases, where only a single physical
    block is required, the ppa_list should not be expanded.

    Therefore, add a vblk parameter to nvm_set_rqd_ppalist(), and only
    expand the ppa_list if vblk is set.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Now that device ops->get_bb_table no longer uses a callback, the
    struct factory_blks can be removed.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The device ops->get_bb_tbl() takes a callback, that allows the caller
    to use its own callback function to update its data structures in the
    returning function.

    This makes it difficult to send parameters to the callback, and usually
    is circumvented by small private structures, that both carry the callers
    state and any flags needed to fulfill the update.

    Refactor ops->get_bb_tbl() to fill a data buffer with the status of the
    blocks returned, and let the user call the callback function manually.
    That will provide the necessary flags and data structures and simplify
    the logic around ops->get_bb_tbl().

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Users that wish to iterate all luns on a device. Must create a
    struct ppa_addr and separate iterators for channels and luns. To set the
    iterators, two loops are required, one to iterate channels, and another
    to iterate luns. This leads to decrease in readability.

    Introduce nvm_for_each_lun_ppa, which implements the nested loop and
    sets ppa, channel, and lun variable for each loop body, eliminating
    the boilerplate code.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • A target name must be unique. However, a per-device registration of
    targets is maintained on a dev->online_targets list, with a per-device
    search for targets upon registration.

    This results in a name collision when two targets, with the same name,
    are created on two different targets, where the per-device list is not
    shared.

    Signed-off-by: Simon A. F. Lund
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Simon A. F. Lund
     
  • The functions nvm_register_target(), nvm_unregister_target() and
    associated list refers to a target type that is being registered by a
    target type module. Rename nvm_*_targets() to nvm_*_tgt_type(), so that
    the intension is clear.

    This enables target instances to use the _nvm_*_targets() naming.

    Signed-off-by: Simon A. F. Lund
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Simon A. F. Lund