30 Aug, 2017

3 commits


29 Aug, 2017

8 commits

  • The only caller of this function is blk_start_request() in the same
    file. Fix blk_start_request() description accordingly.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • Since blk_mq_init_queue() initializes .nr_requests to the tag set
    size and since that value is a good default for the skd driver, do
    not overwrite the value set by blk_mq_init_queue(). This change
    doubles the default value of .nr_requests.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Since sTec s1120 devices support 64-bit DMA it is not necessary
    to request data buffer bouncing. Hence remove the
    blk_queue_bounce_limit() call.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Make this const as is is only passed as an argument to the
    function device_create_file and device_remove_file and the corresponding
    arguments are of type const.
    Done using Coccinelle

    Signed-off-by: Bhumika Goyal
    Signed-off-by: Jens Axboe

    Bhumika Goyal
     
  • We already have this pointer, no need to use to_nullb_device()
    again.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Commit 2984c86(nullb: factor disk parameters) has a typo. The
    nullb_device allocation/free is done outside of null_add_dev. The commit
    accidentally frees the nullb_device in error code path.

    Reported-by: Dan Carpenter
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • There is a race between changing I/O elevator and request_queue removal
    which can trigger the warning in kobject_add_internal. A program can
    use sysfs to request a change of elevator at the same time another task
    is unregistering the request_queue the elevator would be attached to.
    The elevator's kobject will then attempt to be connected to the
    request_queue in the object tree when the request_queue has just been
    removed from sysfs. This triggers the warning in kobject_add_internal
    as the request_queue no longer has a sysfs directory:

    kobject_add_internal failed for iosched (error: -2 parent: queue)
    ------------[ cut here ]------------
    WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0

    To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when
    changing the elevator and use the request_queue's sysfs_lock to
    serialize between clearing the flag and the elevator testing the flag.

    Signed-off-by: David Jeffery
    Tested-by: Ming Lei
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    David Jeffery
     
  • The last parameter "count" never be used in xxx_var_store,
    convert these functions to void.

    Signed-off-by: weiping zhang
    Signed-off-by: Jens Axboe

    weiping zhang
     

26 Aug, 2017

8 commits

  • The SKD_ID_INCR flag in skd_request_context.id duplicates information
    that is already available otherwise, e.g. through the block layer
    request state and through skd_request_context.state. Hence remove
    the code that manipulates this flag and also the flag itself.
    Since skd_isr_completion_posted() only uses the lower bits of
    skd_request_context.id as hardware tag, this patch does not change
    the behavior of the skd driver. I'm referring to the following code:

    tag = req_id & SKD_ID_SLOT_AND_TABLE_MASK;

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Although it is easy to see that skdev->disk != NULL if skdev->queue
    != NULL, add a test for skdev->disk to avoid that smatch reports the
    following warning:

    drivers/block/skd_main.c:3080 skd_free_disk()
    error: we previously assumed 'disk' could be null (see line 3074)

    Reported-by: Dan Carpenter
    Signed-off-by: Bart Van Assche
    Cc: Dan Carpenter
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • It is not worth to keep the debug statements in skd_end_request().
    Without debug statements that function only consists of two
    statements. Hence inline skd_end_request().

    Suggested-by: Christoph Hellwig
    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • The latter name follows more closely the function names used in
    other blk-mq drivers.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Dan reported this:

    The patch 2984c8684f96: "nullb: factor disk parameters" from Aug 14,
    2017, leads to the following Smatch complaint:

    drivers/block/null_blk.c:1759 null_init_tag_set()
    error: we previously assumed 'nullb' could be null (see line
    1750)

    1755 set->cmd_size = sizeof(struct nullb_cmd);
    1756 set->flags = BLK_MQ_F_SHOULD_MERGE;
    1757 set->driver_data = NULL;
    1758
    1759 if (nullb->dev->blocking)
    ^^^^^^^^^^^^^^^^^^^^
    And an unchecked dereference.

    nullb could be NULL here.

    Reported-by: Dan Carpenter
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • this patch fix two errors, firstly avoid kfree blk_root, secondly not
    free(blkcg) ,if blkcg alloc fail(blkcg == NULL), just unlock that mutex;

    Signed-off-by: weiping zhang
    Signed-off-by: Jens Axboe

    weiping zhang
     
  • Update to a working one, the fusionio address hasn't been valid
    in 4 years.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Normally I wouldn't bother with this, but in my opinion the comments are
    the most important part of this whole file since without them no one
    would have any clue how this insanity works.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

24 Aug, 2017

15 commits

  • This patch avoids that sparse reports the following warning messages:

    block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces)
    block/compat_ioctl.c:85:11: expected unsigned long *[noderef] p
    block/compat_ioctl.c:85:11: got void [noderef] *
    block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces)
    block/compat_ioctl.c:91:21: expected void const volatile [noderef] *
    block/compat_ioctl.c:91:21: got unsigned long *[noderef] p
    block/compat_ioctl.c:87:53: warning: dereference of noderef expression
    block/compat_ioctl.c:91:21: warning: dereference of noderef expression

    Fixes: commit d597580d3737 ("generic ...copy_..._user primitives")
    Signed-off-by: Bart Van Assche
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • put_device(pdev) will call pdev->type->release finally, and blk_free_devt
    has been called in part_release(), so remove it.

    Signed-off-by: weiping zhang
    Signed-off-by: Jens Axboe

    weiping zhang
     
  • In dm-integrity target we register integrity profile that have
    both generate_fn and verify_fn callbacks set to NULL.

    This is used if dm-integrity is stacked under a dm-crypt device
    for authenticated encryption (integrity payload contains authentication
    tag and IV seed).

    In this case the verification is done through own crypto API
    processing inside dm-crypt; integrity profile is only holder
    of these data. (And memory is owned by dm-crypt as well.)

    After the commit (and previous changes)
    Commit 7c20f11680a441df09de7235206f70115fbf6290
    Author: Christoph Hellwig
    Date: Mon Jul 3 16:58:43 2017 -0600

    bio-integrity: stop abusing bi_end_io

    we get this crash:

    : BUG: unable to handle kernel NULL pointer dereference at (null)
    : IP: (null)
    : *pde = 00000000
    ...
    :
    : Workqueue: kintegrityd bio_integrity_verify_fn
    : task: f48ae180 task.stack: f4b5c000
    : EIP: (null)
    : EFLAGS: 00210286 CPU: 0
    : EAX: f4b5debc EBX: 00001000 ECX: 00000001 EDX: 00000000
    : ESI: 00001000 EDI: ed25f000 EBP: f4b5dee8 ESP: f4b5dea4
    : DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    : CR0: 80050033 CR2: 00000000 CR3: 32823000 CR4: 001406d0
    : Call Trace:
    : ? bio_integrity_process+0xe3/0x1e0
    : bio_integrity_verify_fn+0xea/0x150
    : process_one_work+0x1c7/0x5c0
    : worker_thread+0x39/0x380
    : kthread+0xd6/0x110
    : ? process_one_work+0x5c0/0x5c0
    : ? kthread_worker_fn+0x100/0x100
    : ? kthread_worker_fn+0x100/0x100
    : ret_from_fork+0x19/0x24
    : Code: Bad EIP value.
    : EIP: (null) SS:ESP: 0068:f4b5dea4
    : CR2: 0000000000000000

    Patch just skip the whole verify workqueue if verify_fn is set to NULL.

    Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
    Signed-off-by: Milan Broz
    [hch: trivial whitespace fix]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Milan Broz
     
  • if elv_register fail, bfq_pool should be free.

    Signed-off-by: weiping zhang
    Signed-off-by: Jens Axboe

    weiping zhang
     
  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This helper allows looking up a partion under RCU protection without
    grabbing a reference to it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • The block layer always remaps partitions before calling into the
    ->make_request methods of drivers. Thus the call to get_start_sect in
    in_chunk_boundary will always return 0 and can be removed.

    Reviewed-by: Shaohua Li
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • We won't have the struct block_device available in the bio soon, so switch
    to the numerical dev_t instead of the block_device pointer for looking up
    the check-integrity state.

    Reviewed-by: Liu Bo
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Since MSI support on some motherboards is unreliable, change the
    default interrupt mode from MSI to MSI-X. This patch avoids that
    the following message appears sporadially in the kernel logs of
    my test setup:

    do_IRQ: 3.193 No irq handler for vector

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Avoid that normal request completion and the timeout handler can
    run concurrently by calling blk_mq_complete_request() instead of
    blk_mq_end_request() from skd_end_request(). Avoid that the block
    layer can reuse a request while the firmware is still processing
    it. Convert skd_softirq_done() to blk-mq. Pass the pointer to
    skd_softirq_done() to the block layer core through
    blk_mq_ops.complete instead of by calling blk_queue_softirq_done().
    Pass the pointer to skd_timed_out() to the block layer core
    through blk_mq_ops.timeout instead of by calling
    blk_queue_timed_out(). The timeout handler has been tested as
    follows:

    echo 1 > /sys/block/skd0/io-timeout-fail &&
    (cd /sys/kernel/debug/fail_io_timeout &&
    echo 100 > probability &&
    echo N > task-filter &&
    echo 1 > times)

    Fixes: commit a74d5b76fab9 ("skd: Switch to block layer timeout mechanism")
    Reported-by: Christoph Hellwig
    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • This patch does not change any functionality but makes the skd
    driver code more similar to that of other blk-mq kernel drivers.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • This patch removes one debug statement but otherwise does not change
    any functionality.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • The timeout handler set by blk_queue_rq_timed_out() is only used
    in single queue mode. Calling this function for blk-mq drivers is
    wrong. Hence issue a warning if this function is called by a blk-mq
    driver.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

23 Aug, 2017

6 commits

  • Sometime disk could have tracks broken and data there is inaccessable,
    but data in other parts can be accessed in normal way. MD RAID supports
    such disks. But we don't have a good way to test it, because we can't
    control which part of a physical disk is bad. For a virtual disk, this
    can be easily controlled.

    This patch adds a new 'badblock' attribute. Configure it in this way:
    echo "+1-100" > xxx/badblock, this will make sector [1-100] as bad
    blocks.
    echo "-20-30" > xxx/badblock, this will make sector [20-30] good

    If badblocks are accessed, the nullb disk will return IO error. Other
    parts of the disk can accessed in normal way.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • Software must flush disk cache to guarantee data safety. To check if
    software correctly does disk cache flush, we must know the behavior of
    disk. But physical disk behavior is uncontrollable. Even software
    doesn't do the flush, the disk probably does the flush. This patch tries
    to emulate a cache in the test disk.

    All write will go to a cache first, when the cache is full, we then
    flush some data to disk storage. A flush request will flush all data of
    the cache to disk storage. A FUA write will write to memory store
    directly and revalidate data in cache. If there is a power failure (by
    writing to power attribute, 'echo 0 > disk_name/power'), we discard all
    data in the cache, but preserve the data in disk storage. Later we can
    power on the disk again as usual (write 1 to 'power' attribute), then we
    can check data integrity and very if software does everything correctly.

    A new attribute 'cache_size' (in MB) is added to configure cache size.

    Based on original patch from Kyungchan Koh

    Signed-off-by: Kyungchan Koh
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • In test, we usually expect controllable disk speed. For example, in a
    raid array, we'd like some disks are fast and some are slow. MD RAID
    actually has a feature for this. To test the feature, we'd like to make
    the disk run in specific speed.

    block throttling probably can be used for this purpose, but it requires
    cgroup setup. Here we just implement a simple throttling mechanism in
    the driver. There is slight fluctuation in the mechanism, but it's good
    enough for test.

    To configure the bandwidth cap, user sets the 'mbps' attribute. mbps is
    MB/s.

    Based on original patch from Kyungchan Koh

    Signed-off-by: Kyungchan Koh
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • discard makes sense for memory backed disk. And also it's useful to test
    if upper layer supports dicard correctly.

    User configures 'discard' attribute to enable/disable dicard support.

    Based on original patch from Kyungchan Koh

    Signed-off-by: Kyungchan Koh
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • This adds memory backed store in nullb.

    User configure 'memory_backed' attribute for this. By default, nullb
    disk doesn't use memory backed store.

    Based on original patch from Kyungchan Koh

    Signed-off-by: Kyungchan Koh
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • We now dynamically create disks. Managing the disk index with ida to
    avoid bump up the index too much.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li