22 Sep, 2016

1 commit

  • bio_free_pages is introduced in commit 1dfa0f68c040
    ("block: add a helper to free bio bounce buffer pages"),
    we can reuse the func in other modules after it was
    imported.

    Cc: Christoph Hellwig
    Cc: Jens Axboe
    Cc: Mike Snitzer
    Cc: Shaohua Li
    Signed-off-by: Guoqing Jiang
    Acked-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Guoqing Jiang
     

21 Sep, 2016

7 commits

  • device_add() may fail, and all callers are supposed to check the
    return value, but one new user in lightnvm doesn't:

    drivers/lightnvm/sysfs.c: In function 'nvm_sysfs_register_dev':
    drivers/lightnvm/sysfs.c:184:2: error: ignoring return value of 'device_add',
    declared with attribute warn_unused_result [-Werror=unused-result]

    This changes the caller to propagate any error codes, which avoids
    the warning.

    Signed-off-by: Arnd Bergmann
    Fixes: 38c9e260b9f9 ("lightnvm: expose device geometry through sysfs")
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • For a host to access an Open-Channel SSD, it has to know its geometry,
    so that it writes and reads at the appropriate device bounds.

    Currently, the geometry information is kept within the kernel, and not
    exported to user-space for consumption. This patch exposes the
    configuration through sysfs and enables user-space libraries, such as
    liblightnvm, to use the sysfs implementation to get the geometry of an
    Open-Channel SSD.

    The sysfs entries are stored within the device hierarchy, and can be
    found using the "lightnvm" device type.

    An example configuration looks like this:

    /sys/class/nvme/
    └── nvme0n1
    ├── capabilities: 3
    ├── device_mode: 1
    ├── erase_max: 1000000
    ├── erase_typ: 1000000
    ├── flash_media_type: 0
    ├── media_capabilities: 0x00000001
    ├── media_type: 0
    ├── multiplane: 0x00010101
    ├── num_blocks: 1022
    ├── num_channels: 1
    ├── num_luns: 4
    ├── num_pages: 64
    ├── num_planes: 1
    ├── page_size: 4096
    ├── prog_max: 100000
    ├── prog_typ: 100000
    ├── read_max: 10000
    ├── read_typ: 10000
    ├── sector_oob_size: 0
    ├── sector_size: 4096
    ├── media_manager: gennvm
    ├── ppa_format: 0x380830082808001010102008
    ├── vendor_opcode: 0
    ├── max_phys_secs: 64
    └── version: 1

    Signed-off-by: Simon A. F. Lund
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Simon A. F. Lund
     
  • LightNVM compatible device drivers does not have a method to expose
    LightNVM specific sysfs entries.

    To enable LightNVM sysfs entries to be exposed, lightnvm device
    drivers require a struct device to attach it to. To allow both the
    actual device driver and lightnvm sysfs entries to coexist, the device
    driver tracks the lifetime of the nvm_dev structure.

    This patch refactors NVMe and null_blk to handle the lifetime of struct
    nvm_dev, which eliminates the need for struct gendisk when a lightnvm
    compatible device is provided.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Enable devices without a gendisk instance to register itself with blk-mq
    and expose the associated multi-queue sysfs entries.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • With LightNVM enabled devices, the gendisk structure is not exposed
    to the user. This hides the device driver specific sysfs entries, and
    prevents binding of LightNVM geometry information to the device.

    Refactor the device registration process, so that gendisk and
    non-gendisk devices are easily managed.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • With LightNVM enabled namespaces, the gendisk structure is not exposed
    to the user. This prevents LightNVM users from accessing the NVMe device
    driver specific sysfs entries, and LightNVM namespace geometry.

    Refactor the revalidation process, so that a namespace, instead of a
    gendisk, is revalidated. This later allows patches to wire up the
    sysfs entries up to a non-gendisk namespace.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • If NO_DMA=y:

    drivers/built-in.o: In function `nvme_nvm_dev_dma_free':
    lightnvm.c:(.text+0x23df1a): undefined reference to `dma_pool_free'
    drivers/built-in.o: In function `nvme_nvm_dev_dma_alloc':
    lightnvm.c:(.text+0x23df38): undefined reference to `dma_pool_alloc'
    drivers/built-in.o: In function `nvme_nvm_destroy_dma_pool':
    lightnvm.c:(.text+0x23df4c): undefined reference to `dma_pool_destroy'
    drivers/built-in.o: In function `nvme_nvm_create_dma_pool':
    lightnvm.c:(.text+0x23df7e): undefined reference to `dma_pool_create'

    and

    ERROR: "dma_pool_destroy" [drivers/nvme/host/nvme-core.ko] undefined!
    ERROR: "dma_pool_free" [drivers/nvme/host/nvme-core.ko] undefined!
    ERROR: "dma_pool_alloc" [drivers/nvme/host/nvme-core.ko] undefined!
    ERROR: "dma_pool_create" [drivers/nvme/host/nvme-core.ko] undefined!

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Geert Uytterhoeven
     

19 Sep, 2016

1 commit

  • Variable weight is not being initialized to zero before it is
    used to compute the weight sum. Ensure it is initialized to zero.

    Found with static analysis with cppcheck:
    [lib/sbitmap.c:177]: (error) Uninitialized variable: weight

    Signed-off-by: Colin Ian King
    Signed-off-by: Jens Axboe

    Colin Ian King
     

18 Sep, 2016

1 commit

  • If we have a bunch of high-numbered bits allocated and then we resize
    the struct sbitmap_queue, when those bits get cleared, we'll update the
    hint and then have to re-randomize it repeatedly. Avoid that by checking
    that the cleared bit is still a valid hint. No measurable performance
    difference in the common case.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

17 Sep, 2016

7 commits

  • After a struct sbitmap_queue is resized smaller, the allocation hints
    may still be set to bits beyond the new depth of the bitmap. This means
    that, for example, if the number of blk-mq tags is reduced through
    sysfs, more requests than the nominal queue depth may be in flight.

    It's tempting to fix this at resize time by doing a one-time
    reinitialization of the hints, but this can race with
    __sbitmap_queue_get() updating the hint. Instead, check the hint before
    we use it. This caused no measurable performance difference in my
    synthetic benchmarks.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • In order to get good cache behavior from a sbitmap, we want each CPU to
    stick to its own cacheline(s) as much as possible. This might happen
    naturally as the bitmap gets filled up and the alloc_hint values spread
    out, but we really want this behavior from the start. blk-mq apparently
    intended to do this, but the code to do this was never wired up. Get rid
    of the dead code and make it part of the sbitmap library.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Again, there's no point in passing this in every time. Make it part of
    struct sbitmap_queue and clean up the API.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Allocating your own per-cpu allocation hint separately makes for an
    awkward API. Instead, allocate the per-cpu hint as part of the struct
    sbitmap_queue. There's no point for a struct sbitmap_queue without the
    cache, but you can still use a bare struct sbitmap.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • The original bt_alloc() we converted from was using kzalloc(), not
    kzalloc_node(), to allocate the wait queues. This was probably an
    oversight, so fix it for sbitmap_queue_init_node().

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • This is a generally useful data structure, so make it available to
    anyone else who might want to use it. It's also a nice cleanup
    separating the allocation logic from the rest of the tag handling logic.

    The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
    selected by CONFIG_BLOCK for now.

    This should be a complete noop functionality-wise.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • We currently account a '0' dispatch, and anything above that still falls
    below the range set by BLK_MQ_MAX_DISPATCH_ORDER. If we dispatch more,
    we don't account it.

    Change the last bucket to be inclusive of anything above the range we
    track, and have the sysfs file reflect that by including a '+' in the
    output:

    $ cat /sys/block/nvme0n1/mq/0/dispatched
    0 1006
    1 20229
    2 1
    4 0
    8 0
    16 0
    32+ 0

    Signed-off-by: Jens Axboe
    Reviewed-by: Omar Sandoval

    Jens Axboe
     

15 Sep, 2016

1 commit

  • blk_mq_delay_kick_requeue_list() provides the ability to kick the
    q->requeue_list after a specified time. To do this the request_queue's
    'requeue_work' member was changed to a delayed_work.

    blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
    requests while it doesn't make sense to immediately requeue them
    (e.g. when all paths in a DM multipath have failed).

    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

14 Sep, 2016

11 commits

  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Since REQ_OP_BITS == 3 and __REQ_NR_BITS == 30 it is not that hard
    to pass an op_flags argument to bio_set_op_attrs() that is larger
    than the number of bits reserved for the op_flags argument. Complain
    if this happens. Additionally, ensure that negative arguments trigger
    a complaint (1 << ... is signed while 1U << ... is unsigned; adding
    0U to an integer expression causes it to be promoted to an unsigned
    type).

    Signed-off-by: Bart Van Assche
    Cc: Mike Christie
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Damien Le Moal
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Introduce the bio_flags() macro. Ensure that the second argument of
    bio_set_op_attrs() only contains flags and no operation. This patch
    does not change any functionality.

    Signed-off-by: Bart Van Assche
    Cc: Mike Christie
    Cc: Chris Mason (maintainer:BTRFS FILE SYSTEM)
    Cc: Josef Bacik (maintainer:BTRFS FILE SYSTEM)
    Cc: Mike Snitzer
    Cc: Hannes Reinecke
    Cc: Damien Le Moal
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Make it clear that the sizeof(unsigned int) expression in BIO_OP_SHIFT
    refers to the bi_opf member of struct bio.

    Signed-off-by: Bart Van Assche
    Cc: Mike Christie
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Damien Le Moal
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • commit e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1
    "block: Do away with the notion of hardsect_size"
    removed the notion of "hardware sector size" from
    the kernel in favor of logical block size, but
    references remain in comments and documentation.

    Update the remaining sites mentioning hardsect.

    Signed-off-by: Linus Walleij
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Linus Walleij
     
  • The blk_mq_alloc_single_hw_queue() is a prototype artifact that
    should have been removed with
    commit cdef54dd85ad66e77262ea57796a3e81683dd5d6
    "blk-mq: remove alloc_hctx and free_hctx methods" where the last
    users of it were deleted.

    Fixes: cdef54dd85ad ("blk-mq: remove alloc_hctx and free_hctx methods")
    Signed-off-by: Linus Walleij
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Linus Walleij
     
  • DAX support for block devices was removed in commits 03cdad
    ("block: disable block device DAX by default") and 99a01cd
    ("block: remove BLK_DEV_DAX config option"), but we still kept a call to
    dax_do_io and some uneeded i_flags manipulations introduced in commit
    bbab37 ("block: Add support for DAX reads/writes to block devices").

    Remove those leftovers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Acked-by: Dan Williams
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Allow the io_poll statistics to be zeroed to make for easier logging
    of polling event.

    Signed-off-by: Stephen Bates
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Stephen Bates
     
  • In order to help determine the effectiveness of polling in a running
    system it is usful to determine the ratio of how often the poll
    function is called vs how often the completion is checked. For this
    reason we add a poll_considered variable and add it to the sysfs entry
    for io_poll.

    Signed-off-by: Stephen Bates
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Stephen Bates
     

09 Sep, 2016

4 commits

  • Instead of rolling our own timer, just utilize the blk mq req timeout and do the
    disconnect if any of our commands timeout.

    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Josef Bacik
     
  • In preparation for some future changes, change a few of the state bools over to
    normal bits to set/clear properly.

    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Josef Bacik
     
  • We hit a warning when shutting down the nbd connection because we have irq's
    disabled. We don't really need to do the shutdown under the lock, just clear
    the nbd->sock. So do the shutdown outside of the irq. This gets rid of the
    warning.

    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Josef Bacik
     
  • This moves NBD over to using blkmq, which allows us to get rid of the NBD
    wide queue lock and the async submit kthread. We will start with 1 hw
    queue for now, but I plan to add multiple tcp connection support in the
    future and we'll fix how we set the hwqueue's.

    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Josef Bacik
     

29 Aug, 2016

7 commits