01 May, 2019
1 commit
-
All these files have some form of the usual GPLv2 boilerplate. Switch
them to use SPDX tags instead.Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
30 Apr, 2019
4 commits
-
Share the bi_size update by moving the done label up, and duplicate
the bv_len update in the two callers to get rid of the bvec_merge
label.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
We are never called with file system pages by defintions for the
passthrough interface, and we also never undo any addition later
these days.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
The same page optimization is a rather odd corner case, which is not
used outside bio.c and which really should not be used outside of bio.c
either - we have better highlevel helpers like the rq/bio mapping
helpers.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
We only have two callers that need the integer loop iterator, and they
can easily maintain it themselves.Suggested-by: Matthew Wilcox
Reviewed-by: Johannes Thumshirn
Acked-by: David Sterba
Reviewed-by: Hannes Reinecke
Acked-by: Coly Li
Reviewed-by: Matthew Wilcox
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
24 Apr, 2019
1 commit
-
The refcount has been increased for pages retrieved from non-bvec iov iter
via __bio_iov_iter_get_pages(), so don't need to do that again.Otherwise, IO pages are leaked easily.
Cc: Christoph Hellwig
Reviewed-by: Chaitanya Kulkarni
Fixes: 7321ecbfc7cf ("block: change how we get page references in bio_iov_iter_get_pages")
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
23 Apr, 2019
1 commit
-
bio_add_page() and __bio_add_page() are capable of adding pages into
bio, and now we have at least two such usages alreay:- __bio_iov_bvec_add_pages()
- nvmet_bdev_execute_rw().So update comments on these two helpers.
The thing is a bit special for __bio_try_merge_page(), given the caller
needs to know if the new added page is same with the last added page,
then it isn't safe to pass multi-page in case that 'same_page' is true,
so adds warning on potential misuse, and updates comment on
__bio_try_merge_page().Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
22 Apr, 2019
1 commit
-
Pull in v5.1-rc6 to resolve two conflicts. One is in BFQ, in just a
comment, and is trivial. The other one is a conflict due to a later fix
in the bio multi-page work, and needs a bit more care.* tag 'v5.1-rc6': (770 commits)
Linux 5.1-rc6
block: make sure that bvec length can't be overflow
block: kill all_q_node in request_queue
x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority
coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
mm/kmemleak.c: fix unused-function warning
init: initialize jump labels before command line option parsing
kernel/watchdog_hld.c: hard lockup message should end with a newline
kcov: improve CONFIG_ARCH_HAS_KCOV help text
mm: fix inactive list balancing between NUMA nodes and cgroups
mm/hotplug: treat CMA pages as unmovable
proc: fixup proc-pid-vm test
proc: fix map_files test on F29
mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock
mm: swapoff: shmem_unuse() stop eviction without igrab()
mm: swapoff: take notice of completion sooner
mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
mm: swapoff: shmem_find_swap_entries() filter out other types
slab: store tagged freelist for off-slab slabmgmt
...Signed-off-by: Jens Axboe
12 Apr, 2019
4 commits
-
We currently have to call nth_page when iterating over pages inside a
bio_vec. Jens complained a while ago that this is fairly expensive.
To mitigate this we can check that that the actual page structures
are contiguous when adding them to the bio, and just do check pointer
arithmetics later on.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Instead of needing a special macro to iterate over all pages in
a bvec just do a second passs over the whole bio. This also matches
what we do on the release side. The release side helper is moved
up to where we need the get helper to clearly express the symmetry.Reviewed-by: Ming Lei
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
No caller uses bio_iov_iter_get_pages multiple times on a given bio,
and that funtionality isn't all that useful. Removing it will make
some future changes a little easier and also simplifies the function
a bit.Reviewed-by: Ming Lei
Reviewed-by: Bart Van Assche
Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe -
Return early on error, and add an unlikely annotation for that case.
Reviewed-by: Ming Lei
Signed-off-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe
11 Apr, 2019
1 commit
-
When bio_add_pc_page() fails in bio_copy_user_iov() we should free
the page we just allocated otherwise we are leaking it.Cc: linux-block@vger.kernel.org
Cc: Linus Torvalds
Cc: stable@vger.kernel.org
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Jérôme Glisse
Signed-off-by: Jens Axboe
04 Apr, 2019
1 commit
-
With the introduction of BIO_NO_PAGE_REF we've used up all available bits
in bio::bi_flags.Convert the defines of the flags to an enum and add a BUILD_BUG_ON() call
to make sure no-one adds a new one and thus overrides the BVEC_POOL_IDX
causing crashes.Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Reviewed-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
Signed-off-by: Johannes Thumshirn
Signed-off-by: Jens Axboe
02 Apr, 2019
5 commits
-
Now block IO stack is basically ready for supporting multi-page bvec,
however it isn't enabled on passthrough IO.One reason is that passthrough IO is dispatched to LLD directly and bio
split is bypassed, so the bio has to be built correctly for dispatch to
LLD from the beginning.Implement multi-page support for passthrough IO by limitting each bvec
as block device's segment and applying all kinds of queue limit in
blk_add_pc_page(). Then we don't need to calculate segments any more for
passthrough IO any more, turns out code is simplified much.Cc: Omar Sandoval
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
When the added page is merged to last same page in bio_add_pc_page(),
the user may need to put this page for avoiding page leak.bio_map_user_iov() needs this kind of handling, and now it deals with
it by itself in hack style.Moves the handling of put page into __bio_add_pc_page(), so
bio_map_user_iov() may be simplified a bit, and maybe more users
can benefit from this change.Cc: Omar Sandoval
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
Now the check for deciding if one page is mergeable to current bvec
becomes a bit complicated, and we need to reuse the code before
adding pc page.So move the check in one dedicated helper.
No function change.
Cc: Omar Sandoval
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
REQ_PC is out of date, so replace it with passthrough IO.
Also remove the local variable of 'prev' since we can reuse
the top local variable of 'bvec'.No function change.
Cc: Omar Sandoval
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
XEN has special page merge requirement, see xen_biovec_phys_mergeable().
We can't merge pages into one bvec simply for XEN.So move XEN's specific check on page merge into __bio_try_merge_page(),
then abvoid to break XEN by multi-page bvec.Cc: ris Ostrovsky
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval
Cc: Christoph Hellwig
Reviewed-by: Juergen Gross
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
19 Mar, 2019
1 commit
-
If bio_iov_iter_get_pages() is called on an iov_iter that is flagged
with NO_REF, then we don't need to add a page reference for the pages
that we add.Add BIO_NO_PAGE_REF to track this in the bio, so IO completion knows
not to drop a reference to these pages.Signed-off-by: Jens Axboe
28 Feb, 2019
1 commit
-
For an ITER_BVEC, we can just iterate the iov and add the pages
to the bio directly. For now, we grab a reference to those pages,
and release them normally on IO completion. This isn't really needed
for the normal case of O_DIRECT from/to a file, but some of the more
esoteric use cases (like splice(2)) will unconditionally put the
pipe buffer pages when the buffers are released. Until we can manage
that case properly, ITER_BVEC pages are treated like normal pages
in terms of reference counting.Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
15 Feb, 2019
2 commits
-
This patch pulls the trigger for multi-page bvecs.
Reviewed-by: Omar Sandoval
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.Reviewed-by: Omar Sandoval
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
03 Jan, 2019
1 commit
-
Pull more block updates from Jens Axboe:
- Dead code removal for loop/sunvdc (Chengguang)
- Mark BIDI support for bsg as deprecated, logging a single dmesg
warning if anyone is actually using it (Christoph)- blkcg cleanup, killing a dead function and making the tryget_closest
variant easier to read (Dennis)- Floppy fixes, one fixing a regression in swim3 (Finn)
- lightnvm use-after-free fix (Gustavo)
- gdrom leak fix (Wenwen)
- a set of drbd updates (Lars, Luc, Nathan, Roland)
* tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-block: (28 commits)
block/swim3: Fix regression on PowerBook G3
block/swim3: Fix -EBUSY error when re-opening device after unmount
block/swim3: Remove dead return statement
block/amiflop: Don't log error message on invalid ioctl
gdrom: fix a memory leak bug
lightnvm: pblk: fix use-after-free bug
block: sunvdc: remove redundant code
block: loop: remove redundant code
bsg: deprecate BIDI support in bsg
blkcg: remove unused __blkg_release_rcu()
blkcg: clean up blkg_tryget_closest()
drbd: Change drbd_request_detach_interruptible's return type to int
drbd: Avoid Clang warning about pointless switch statment
drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")
drbd: skip spurious timeout (ping-timeo) when failing promote
drbd: don't retry connection if peers do not agree on "authentication" settings
drbd: fix print_st_err()'s prototype to match the definition
drbd: avoid spurious self-outdating with concurrent disconnect / down
drbd: do not block when adjusting "disk-options" while IO is frozen
drbd: fix comment typos
...
29 Dec, 2018
1 commit
-
Pull block updates from Jens Axboe:
"This is the main pull request for block/storage for 4.21.Larger than usual, it was a busy round with lots of goodies queued up.
Most notable is the removal of the old IO stack, which has been a long
time coming. No new features for a while, everything coming in this
week has all been fixes for things that were previously merged.This contains:
- Use atomic counters instead of semaphores for mtip32xx (Arnd)
- Cleanup of the mtip32xx request setup (Christoph)
- Fix for circular locking dependency in loop (Jan, Tetsuo)
- bcache (Coly, Guoju, Shenghui)
* Optimizations for writeback caching
* Various fixes and improvements- nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
* host and target support for NVMe over TCP
* Error log page support
* Support for separate read/write/poll queues
* Much improved polling
* discard OOM fallback
* Tracepoint improvements- lightnvm (Hans, Hua, Igor, Matias, Javier)
* Igor added packed metadata to pblk. Now drives without metadata
per LBA can be used as well.
* Fix from Geert on uninitialized value on chunk metadata reads.
* Fixes from Hans and Javier to pblk recovery and write path.
* Fix from Hua Su to fix a race condition in the pblk recovery
code.
* Scan optimization added to pblk recovery from Zhoujie.
* Small geometry cleanup from me.- Conversion of the last few drivers that used the legacy path to
blk-mq (me)- Removal of legacy IO path in SCSI (me, Christoph)
- Removal of legacy IO stack and schedulers (me)
- Support for much better polling, now without interrupts at all.
blk-mq adds support for multiple queue maps, which enables us to
have a map per type. This in turn enables nvme to have separate
completion queues for polling, which can then be interrupt-less.
Also means we're ready for async polled IO, which is hopefully
coming in the next release.- Killing of (now) unused block exports (Christoph)
- Unification of the blk-rq-qos and blk-wbt wait handling (Josef)
- Support for zoned testing with null_blk (Masato)
- sx8 conversion to per-host tag sets (Christoph)
- IO priority improvements (Damien)
- mq-deadline zoned fix (Damien)
- Ref count blkcg series (Dennis)
- Lots of blk-mq improvements and speedups (me)
- sbitmap scalability improvements (me)
- Make core inflight IO accounting per-cpu (Mikulas)
- Export timeout setting in sysfs (Weiping)
- Cleanup the direct issue path (Jianchao)
- Export blk-wbt internals in block debugfs for easier debugging
(Ming)- Lots of other fixes and improvements"
* tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
kyber: use sbitmap add_wait_queue/list_del wait helpers
sbitmap: add helpers for add/del wait queue handling
block: save irq state in blkg_lookup_create()
dm: don't reuse bio for flushes
nvme-pci: trace SQ status on completions
nvme-rdma: implement polling queue map
nvme-fabrics: allow user to pass in nr_poll_queues
nvme-fabrics: allow nvmf_connect_io_queue to poll
nvme-core: optionally poll sync commands
block: make request_to_qc_t public
nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
nvme-tcp: fix endianess annotations
nvmet-tcp: fix endianess annotations
nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
nvme-pci: only set nr_maps to 2 if poll queues are supported
nvmet: use a macro for default error location
nvmet: fix comparison of a u16 with -1
blk-mq: enable IO poll if .nr_queues of type poll > 0
blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
blk-mq: skip zero-queue maps in blk_mq_map_swqueue
...
21 Dec, 2018
1 commit
-
The implementation of blkg_tryget_closest() wasn't super obvious and
became a point of suspicion when debugging [1]. So let's clean it up so
it's obviously not the problem.Also add missing RCU read locking to bio_clone_blkg_association(), which
got exposed by adding the RCU read lock held check in
blkg_tryget_closest().[1] https://lore.kernel.org/linux-block/a7e97e4b-0dd8-3a54-23b7-a0f27b17fde8@kernel.dk/
Signed-off-by: Dennis Zhou
Signed-off-by: Jens Axboe
14 Dec, 2018
3 commits
-
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
11 Dec, 2018
1 commit
-
We don't need to zero fill the bio if not using kernel allocated pages.
Fixes: f3587d76da05 ("block: Clear kernel memory before copying to user") # v4.20-rc2
Reported-by: Todd Aiken
Cc: Laurence Oberman
Cc: stable@vger.kernel.org
Cc: Bart Van Assche
Tested-by: Laurence Oberman
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe
10 Dec, 2018
2 commits
-
We want to convert to per-cpu in_flight counters.
The function part_round_stats needs the in_flight counter every jiffy, it
would be too costly to sum all the percpu variables every jiffy, so it
must be deleted. part_round_stats is used to calculate two counters -
time_in_queue and io_ticks.time_in_queue can be calculated without part_round_stats, by adding the
duration of the I/O when the I/O ends (the value is almost as exact as the
previously calculated value, except that time for in-progress I/Os is not
counted).io_ticks can be approximated by increasing the value when I/O is started
or ended and the jiffies value has changed. If the I/Os take less than a
jiffy, the value is as exact as the previously calculated value. If the
I/Os take more than a jiffy, io_ticks can drift behind the previously
calculated value.Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe -
All of part_stat_* and related methods are used with preempt disabled,
so there is no need to pass cpu around to allow of them. Just call
smp_processor_id() as needed.Suggested-by: Jens Axboe
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe
08 Dec, 2018
8 commits
-
blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or %NULL.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
Now that a bio only holds a blkg reference, so clean up is simply
putting back that reference. Remove bio_disassociate_task() as it just
calls bio_disassociate_blkg() and call the latter directly.Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe -
The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe -
Prior patches ensured that any bio that interacts with a request_queue
is properly associated with a blkg. This makes bio->bi_css unnecessary
as blkg maintains a reference to blkcg already.This removes the bio field bi_css and transfers corresponding uses to
access via bi_blkg.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg(). In
this patch, wbc_init_bio() now requires a bio to have a device
associated with it.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
A prior patch in this series added blkg association to bios issued by
cgroups. There are two other paths that we want to attribute work back
to the appropriate cgroup: swap and writeback. Here we modify the way
swap tags bios to include the blkg. Writeback will be tackle in the next
patch.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Reviewed-by: Liu Bo
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe -
Previously, blkg association was handled by controller specific code in
blk-throttle and blk-iolatency. However, because a blkg represents a
relationship between a blkcg and a request_queue, it makes sense to keep
the blkg->q and bio->bi_disk->queue consistent.This patch moves association into the bio_set_dev macro(). This should
cover the majority of cases where the device is set/changed keeping the
two pointers consistent. Fallback code is added to
blkcg_bio_issue_check() to catch any missing paths.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe