31 Mar, 2020

3 commits

  • Pull EFI updates from Ingo Molnar:
    "The EFI changes in this cycle are much larger than usual, for two
    (positive) reasons:

    - The GRUB project is showing signs of life again, resulting in the
    introduction of the generic Linux/UEFI boot protocol, instead of
    x86 specific hacks which are increasingly difficult to maintain.
    There's hope that all future extensions will now go through that
    boot protocol.

    - Preparatory work for RISC-V EFI support.

    The main changes are:

    - Boot time GDT handling changes

    - Simplify handling of EFI properties table on arm64

    - Generic EFI stub cleanups, to improve command line handling, file
    I/O, memory allocation, etc.

    - Introduce a generic initrd loading method based on calling back
    into the firmware, instead of relying on the x86 EFI handover
    protocol or device tree.

    - Introduce a mixed mode boot method that does not rely on the x86
    EFI handover protocol either, and could potentially be adopted by
    other architectures (if another one ever surfaces where one
    execution mode is a superset of another)

    - Clean up the contents of 'struct efi', and move out everything that
    doesn't need to be stored there.

    - Incorporate support for UEFI spec v2.8A changes that permit
    firmware implementations to return EFI_UNSUPPORTED from UEFI
    runtime services at OS runtime, and expose a mask of which ones are
    supported or unsupported via a configuration table.

    - Partial fix for the lack of by-VA cache maintenance in the
    decompressor on 32-bit ARM.

    - Changes to load device firmware from EFI boot service memory
    regions

    - Various documentation updates and minor code cleanups and fixes"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
    efi/libstub/arm: Fix spurious message that an initrd was loaded
    efi/libstub/arm64: Avoid image_base value from efi_loaded_image
    partitions/efi: Fix partition name parsing in GUID partition entry
    efi/x86: Fix cast of image argument
    efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
    efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
    efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
    efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
    efi/x86: Ignore the memory attributes table on i386
    efi/x86: Don't relocate the kernel unless necessary
    efi/x86: Remove extra headroom for setup block
    efi/x86: Add kernel preferred address to PE header
    efi/x86: Decompress at start of PE image load address
    x86/boot/compressed/32: Save the output address instead of recalculating it
    efi/libstub/x86: Deal with exit() boot service returning
    x86/boot: Use unsigned comparison for addresses
    efi/x86: Avoid using code32_start
    efi/x86: Make efi32_pe_entry() more readable
    efi/x86: Respect 32-bit ABI in efi32_pe_entry()
    efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
    ...

    Linus Torvalds
     
  • Pull block driver updates from Jens Axboe:

    - floppy driver cleanup series from Willy

    - NVMe updates and fixes (Various)

    - null_blk trace improvements (Chaitanya)

    - bcache fixes (Coly)

    - md fixes (via Song)

    - loop block size change optimizations (Martijn)

    - scnprintf() use (Takashi)

    * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits)
    null_blk: add trace in null_blk_zoned.c
    null_blk: add tracepoint helpers for zoned mode
    block: add a zone condition debug helper
    nvme: cleanup namespace identifier reporting in nvme_init_ns_head
    nvme: rename __nvme_find_ns_head to nvme_find_ns_head
    nvme: refactor nvme_identify_ns_descs error handling
    nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl
    nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl
    nvme: Fix controller creation races with teardown flow
    nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl
    nvme: Fix ctrl use-after-free during sysfs deletion
    nvme-pci: Re-order nvme_pci_free_ctrl
    nvme: Remove unused return code from nvme_delete_ctrl_sync
    nvme: Use nvme_state_terminal helper
    nvme: release ida resources
    nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO
    nvmet-tcp: optimize tcp stack TX when data digest is used
    nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow
    nvme-multipath: do not reset on unknown status
    nvmet-rdma: allocate RW ctxs according to mdts
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:

    - Online capacity resizing (Balbir)

    - Number of hardware queue change fixes (Bart)

    - null_blk fault injection addition (Bart)

    - Cleanup of queue allocation, unifying the node/no-node API
    (Christoph)

    - Cleanup of genhd, moving code to where it makes sense (Christoph)

    - Cleanup of the partition handling code (Christoph)

    - disk stat fixes/improvements (Konstantin)

    - BFQ improvements (Paolo)

    - Various fixes and improvements

    * tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits)
    block: return NULL in blk_alloc_queue() on error
    block: move bio_map_* to blk-map.c
    Revert "blkdev: check for valid request queue before issuing flush"
    block: simplify queue allocation
    bcache: pass the make_request methods to blk_queue_make_request
    null_blk: use blk_mq_init_queue_data
    block: add a blk_mq_init_queue_data helper
    block: move the ->devnode callback to struct block_device_operations
    block: move the part_stat* helpers from genhd.h to a new header
    block: move block layer internals out of include/linux/genhd.h
    block: move guard_bio_eod to bio.c
    block: unexport get_gendisk
    block: unexport disk_map_sector_rcu
    block: unexport disk_get_part
    block: mark part_in_flight and part_in_flight_rw static
    block: mark block_depr static
    block: factor out requeue handling from dispatch code
    block/diskstats: replace time_in_queue with sum of request times
    block/diskstats: accumulate all per-cpu counters in one pass
    block/diskstats: more accurate approximation of io_ticks for slow disks
    ...

    Linus Torvalds
     

30 Mar, 2020

1 commit

  • This patch fixes follwoing warning:

    block/blk-core.c: In function ‘blk_alloc_queue’:
    block/blk-core.c:558:10: warning: returning ‘int’ from a function with return type ‘struct request_queue *’ makes pointer from integer without a cast [-Wint-conversion]
    return -EINVAL;

    Fixes: 3d745ea5b095a ("block: simplify queue allocation")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     

28 Mar, 2020

5 commits

  • Add a helper to stringify the zone conditions. We use this helper in the
    next patch to track zone conditions in tracepoints.

    Reviewed-by: Damien Le Moal
    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • The bio_map_* helpers are just the low-level helpers for the
    blk_rq_map_* APIs. Move them together for better logical grouping,
    as no there isn't much overlap with other code in bio.c.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This reverts commit f10d9f617a65905c556c3b37c9b9646ae7d04ed7.

    We can't have queues without a make_request_fn any more (and the
    loop device uses blk-mq these days anyway..).

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Current make_request based drivers use either blk_alloc_queue_node or
    blk_alloc_queue to allocate a queue, and then set up the make_request_fn
    function pointer and a few parameters using the blk_queue_make_request
    helper. Simplify this by passing the make_request pointer to
    blk_alloc_queue, and while at it merge the _node variant into the main
    helper by always passing a node_id, and remove the superfluous gfp_mask
    parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This allows a driver to pass a queuedata member before ->init_hctx is
    called. null_blk currently open codes this logic, but I'd rather have
    it in the core to ease future maintainance.

    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

27 Mar, 2020

1 commit


25 Mar, 2020

12 commits

  • These macros are just used by a few files. Move them out of genhd.h,
    which is included everywhere into a new standalone header.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • None of this needs to be exposed to drivers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This is bio layer functionality and not related to buffer heads.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • get_gendisk is not used by any modular code.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • disk_map_sector_rcu is not used by any modular code.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • disk_get_part is not used by any modular code.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Factor out the requeue handling from the dispatch code, this will make
    subsequent addition of different requeueing schemes easier.

    Signed-off-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Johannes Thumshirn
     
  • Column "time_in_queue" in diskstats is supposed to show total waiting time
    of all requests. I.e. value should be equal to the sum of times from other
    columns. But this is not true, because column "time_in_queue" is counted
    separately in jiffies rather than in nanoseconds as other times.

    This patch removes redundant counter for "time_in_queue" and shows total
    time of read, write, discard and flush requests.

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Jens Axboe

    Konstantin Khlebnikov
     
  • Reading /proc/diskstats iterates over all cpus for summing each field.
    It's faster to sum all fields in one pass.

    Hammering /proc/diskstats with fio shows 2x performance improvement:

    fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \
    --size=1k --bs=1k --fallocate=none --create_on_open=1 \
    --time_based=1 --runtime=10 --invalidate=0 --group_report

    JOBS=1 JOBS=10
    Before: 7k iops 64k iops
    After: 18k iops 120k iops

    Also this way code is more compact:

    add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346)
    Function old new delta
    part_stat_read_all - 194 +194
    diskstats_show 1344 631 -713
    part_stat_show 1219 392 -827
    Total: Before=14966947, After=14965601, chg -0.01%

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Jens Axboe

    Konstantin Khlebnikov
     
  • Currently io_ticks is approximated by adding one at each start and end of
    requests if jiffies counter has changed. This works perfectly for requests
    shorter than a jiffy or if one of requests starts/ends at each jiffy.

    If disk executes just one request at a time and they are longer than two
    jiffies then only first and last jiffies will be accounted.

    Fix is simple: at the end of request add up into io_ticks jiffies passed
    since last update rather than just one jiffy.

    Example: common HDD executes random read 4k requests around 12ms.

    fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
    iostat -x 10 sdb

    Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:

    Before:

    Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
    sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43

    After:

    Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
    sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99

    Now io_ticks does not loose time between start and end of requests, but
    for queue-depth > 1 some I/O time between adjacent starts might be lost.

    For load estimation "%util" is not as useful as average queue length,
    but it clearly shows how often disk queue is completely empty.

    Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
    Signed-off-by: Konstantin Khlebnikov
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Konstantin Khlebnikov
     

24 Mar, 2020

18 commits