31 Mar, 2020
3 commits
-
Pull EFI updates from Ingo Molnar:
"The EFI changes in this cycle are much larger than usual, for two
(positive) reasons:- The GRUB project is showing signs of life again, resulting in the
introduction of the generic Linux/UEFI boot protocol, instead of
x86 specific hacks which are increasingly difficult to maintain.
There's hope that all future extensions will now go through that
boot protocol.- Preparatory work for RISC-V EFI support.
The main changes are:
- Boot time GDT handling changes
- Simplify handling of EFI properties table on arm64
- Generic EFI stub cleanups, to improve command line handling, file
I/O, memory allocation, etc.- Introduce a generic initrd loading method based on calling back
into the firmware, instead of relying on the x86 EFI handover
protocol or device tree.- Introduce a mixed mode boot method that does not rely on the x86
EFI handover protocol either, and could potentially be adopted by
other architectures (if another one ever surfaces where one
execution mode is a superset of another)- Clean up the contents of 'struct efi', and move out everything that
doesn't need to be stored there.- Incorporate support for UEFI spec v2.8A changes that permit
firmware implementations to return EFI_UNSUPPORTED from UEFI
runtime services at OS runtime, and expose a mask of which ones are
supported or unsupported via a configuration table.- Partial fix for the lack of by-VA cache maintenance in the
decompressor on 32-bit ARM.- Changes to load device firmware from EFI boot service memory
regions- Various documentation updates and minor code cleanups and fixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
efi/libstub/arm: Fix spurious message that an initrd was loaded
efi/libstub/arm64: Avoid image_base value from efi_loaded_image
partitions/efi: Fix partition name parsing in GUID partition entry
efi/x86: Fix cast of image argument
efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
efi/x86: Ignore the memory attributes table on i386
efi/x86: Don't relocate the kernel unless necessary
efi/x86: Remove extra headroom for setup block
efi/x86: Add kernel preferred address to PE header
efi/x86: Decompress at start of PE image load address
x86/boot/compressed/32: Save the output address instead of recalculating it
efi/libstub/x86: Deal with exit() boot service returning
x86/boot: Use unsigned comparison for addresses
efi/x86: Avoid using code32_start
efi/x86: Make efi32_pe_entry() more readable
efi/x86: Respect 32-bit ABI in efi32_pe_entry()
efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
... -
Pull block driver updates from Jens Axboe:
- floppy driver cleanup series from Willy
- NVMe updates and fixes (Various)
- null_blk trace improvements (Chaitanya)
- bcache fixes (Coly)
- md fixes (via Song)
- loop block size change optimizations (Martijn)
- scnprintf() use (Takashi)
* tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits)
null_blk: add trace in null_blk_zoned.c
null_blk: add tracepoint helpers for zoned mode
block: add a zone condition debug helper
nvme: cleanup namespace identifier reporting in nvme_init_ns_head
nvme: rename __nvme_find_ns_head to nvme_find_ns_head
nvme: refactor nvme_identify_ns_descs error handling
nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl
nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl
nvme: Fix controller creation races with teardown flow
nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl
nvme: Fix ctrl use-after-free during sysfs deletion
nvme-pci: Re-order nvme_pci_free_ctrl
nvme: Remove unused return code from nvme_delete_ctrl_sync
nvme: Use nvme_state_terminal helper
nvme: release ida resources
nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO
nvmet-tcp: optimize tcp stack TX when data digest is used
nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow
nvme-multipath: do not reset on unknown status
nvmet-rdma: allocate RW ctxs according to mdts
... -
Pull block updates from Jens Axboe:
- Online capacity resizing (Balbir)
- Number of hardware queue change fixes (Bart)
- null_blk fault injection addition (Bart)
- Cleanup of queue allocation, unifying the node/no-node API
(Christoph)- Cleanup of genhd, moving code to where it makes sense (Christoph)
- Cleanup of the partition handling code (Christoph)
- disk stat fixes/improvements (Konstantin)
- BFQ improvements (Paolo)
- Various fixes and improvements
* tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits)
block: return NULL in blk_alloc_queue() on error
block: move bio_map_* to blk-map.c
Revert "blkdev: check for valid request queue before issuing flush"
block: simplify queue allocation
bcache: pass the make_request methods to blk_queue_make_request
null_blk: use blk_mq_init_queue_data
block: add a blk_mq_init_queue_data helper
block: move the ->devnode callback to struct block_device_operations
block: move the part_stat* helpers from genhd.h to a new header
block: move block layer internals out of include/linux/genhd.h
block: move guard_bio_eod to bio.c
block: unexport get_gendisk
block: unexport disk_map_sector_rcu
block: unexport disk_get_part
block: mark part_in_flight and part_in_flight_rw static
block: mark block_depr static
block: factor out requeue handling from dispatch code
block/diskstats: replace time_in_queue with sum of request times
block/diskstats: accumulate all per-cpu counters in one pass
block/diskstats: more accurate approximation of io_ticks for slow disks
...
30 Mar, 2020
1 commit
-
This patch fixes follwoing warning:
block/blk-core.c: In function ‘blk_alloc_queue’:
block/blk-core.c:558:10: warning: returning ‘int’ from a function with return type ‘struct request_queue *’ makes pointer from integer without a cast [-Wint-conversion]
return -EINVAL;Fixes: 3d745ea5b095a ("block: simplify queue allocation")
Reviewed-by: Christoph Hellwig
Signed-off-by: Chaitanya Kulkarni
Signed-off-by: Jens Axboe
28 Mar, 2020
5 commits
-
Add a helper to stringify the zone conditions. We use this helper in the
next patch to track zone conditions in tracepoints.Reviewed-by: Damien Le Moal
Signed-off-by: Chaitanya Kulkarni
Signed-off-by: Jens Axboe -
The bio_map_* helpers are just the low-level helpers for the
blk_rq_map_* APIs. Move them together for better logical grouping,
as no there isn't much overlap with other code in bio.c.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
This reverts commit f10d9f617a65905c556c3b37c9b9646ae7d04ed7.
We can't have queues without a make_request_fn any more (and the
loop device uses blk-mq these days anyway..).Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Current make_request based drivers use either blk_alloc_queue_node or
blk_alloc_queue to allocate a queue, and then set up the make_request_fn
function pointer and a few parameters using the blk_queue_make_request
helper. Simplify this by passing the make_request pointer to
blk_alloc_queue, and while at it merge the _node variant into the main
helper by always passing a node_id, and remove the superfluous gfp_mask
parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
This allows a driver to pass a queuedata member before ->init_hctx is
called. null_blk currently open codes this logic, but I'd rather have
it in the core to ease future maintainance.Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
27 Mar, 2020
1 commit
-
There really isn't any good reason to stash a method directly into
struct gendisk. Move it together with the other block device
operations.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
25 Mar, 2020
12 commits
-
These macros are just used by a few files. Move them out of genhd.h,
which is included everywhere into a new standalone header.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
None of this needs to be exposed to drivers.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
This is bio layer functionality and not related to buffer heads.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
get_gendisk is not used by any modular code.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
disk_map_sector_rcu is not used by any modular code.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
disk_get_part is not used by any modular code.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Factor out the requeue handling from the dispatch code, this will make
subsequent addition of different requeueing schemes easier.Signed-off-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe -
Reading /proc/diskstats iterates over all cpus for summing each field.
It's faster to sum all fields in one pass.Hammering /proc/diskstats with fio shows 2x performance improvement:
fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \
--size=1k --bs=1k --fallocate=none --create_on_open=1 \
--time_based=1 --runtime=10 --invalidate=0 --group_reportJOBS=1 JOBS=10
Before: 7k iops 64k iops
After: 18k iops 120k iopsAlso this way code is more compact:
add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346)
Function old new delta
part_stat_read_all - 194 +194
diskstats_show 1344 631 -713
part_stat_show 1219 392 -827
Total: Before=14966947, After=14965601, chg -0.01%Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe -
Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.Example: common HDD executes random read 4k requests around 12ms.
fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdbNote changes of iostat's "%util" 8,43% -> 99,99% before/after patch:
Before:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43After:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99Now io_ticks does not loose time between start and end of requests, but
for queue-depth > 1 some I/O time between adjacent starts might be lost.For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Konstantin Khlebnikov
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe
24 Mar, 2020
18 commits
-
Merge block/partition-generic.c and block/partitions/check.c into
a single block/partitions/core.c as the content is closely related
and both files are tiny.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
All these are just used in block/partitions/msdos.c, so move them out of the
genhd.h driver included by every driver.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Just always use NEW_SOLARIS_X86_PARTITION and explain the situation,
as that is less confusing than two names for a single value.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
The enum containing the *_PARTITION symbolic names is only relevant
for the partition parser. More specifically most values are MSDOS
partition table system indicators and thus should go straight into
msdos.c. One value is only used by the sun partition parser, and the
sun and sgi partition parsers use the same value as the x86 Linux
RAID indicator to also indicate RAID autodetection. Duplicate them
in sun.c and sgi.c given that the different partition types use
entirely different values otherwise.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
struct partition is the on-disk format of a MSDOS partition table entry.
Move it out of genhd.h into a new msdos_partition.h header and give it
a msdos_ prefix to avoid confusion.
Also move the magic number from block/partitions/msdos.h to the new
header so that it can be used by the SCSI drivers looking at the DOS
partition tables.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Just move the two defines to block/partitions/sun.c.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Just move the single define to block/partitions/sgi.c.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Just move the single define to block/partitions/osf.c.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Just move the single define to block/partitions/karma.c.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
There is no good reason to include one header per partition type in
core.c. Instead move the prototypes for the detection routins to
check.h, and remove all now empty headers in block/partitions/.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
The warn_no_part is initialized to 1 and never changed. Remove
it and execute the code keyed off from it unconditionally.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Add a new include/linux/raid/detect.h header to declare the
md_autodetect_dev prototype which can be shared between md and
the partition code. Then use IS_BUILTIN to call it instead of the
ifdef magic.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
read_dev_sector and put_dev_sector are now only used by the partition
parsing code. Remove the export for read_dev_sector and merge it into
the only caller. Clean the mess up a bit by using goto labels and
the SECTOR_SHIFT constant.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
There isn't any good reason not to simply open code the allocation and
freeing of the partition_meta_info structure. Especially as one of
the branches in alloc_part_info is entirely dead code.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Move the sysfs _show methods that are used both on the full disk and
partition nodes to genhd.c instead of hiding them in the partitioning
code. Also move the declaration for these methods to block/blk.h so
that we don't expose them to drivers.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Thes functions aren't really related to partition support, so move them
to a more suitable place.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
There is no good reason for __bdevname to exist. Just open code
printing the string in the callers. For three of them the format
string can be trivially merged into existing printk statements,
and in init/do_mounts.c we can at least do the scnprintf once at
the start of the function, and unconditional of CONFIG_BLOCK to
make the output for tiny configfs a little more helpful.Acked-by: Theodore Ts'o # for ext4
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
This function is only used by init/do_mounts.c, which can't be modular.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe