Eric Lee / smarc-fsl-linux-kernel

29 Oct, 2020

1 commit

338f8e80b Merge 0746c4a9f3d3 ("Merge branch 'i2c/for-5.10' of git://git.kernel.org/pub/scm… ... Browse Code »

…/linux/kernel/git/wsa/linux") into android-mainline

Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iec426c6de4a59a517e5fa575a9424b883d958f08

Greg Kroah-Hartman
2020-10-29 04:20:34 +0800

24 Oct, 2020

2 commits

c62fe038f Merge 3ad11d7ac887 ("Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/li… ... Browse Code »

…nux-block") into android-mainline

Steps on the way to 5.10-rc1

Resolves conflicts in:
include/linux/blk-crypto.h

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4012850c2e4b804d9e87e90b8e03a3b9ce21b5e7

Greg Kroah-Hartman
2020-10-24 23:29:43 +0800
e92900328 Merge 1f06959bd2c9 ("block: remove the unused q argument to part_in_flight and p… ... Browse Code »

…art_in_flight_rw") into android-mailine

Bisection on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2cd6d1d4deabeb7f9a17cf558aebd589449a37da

Greg Kroah-Hartman
2020-10-24 16:17:17 +0800

14 Oct, 2020

2 commits

3b481d913 block: add zone specific block statuses ... Browse Code »

A zoned device with limited resources to open or activate zones may
return an error when the host exceeds those limits. The same command may
be successful if retried later, but the host needs to wait for specific
zone states before it should expect a retry to succeed. Have the block
layer provide an appropriate status for these conditions so applications
can distinuguish this error for special handling.

Cc: linux-api@vger.kernel.org
Cc: Niklas Cassel
Reviewed-by: Christoph Hellwig
Reviewed-by: Damien Le Moal
Reviewed-by: Johannes Thumshirn
Reviewed-by: Martin K. Petersen
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2020-10-14 05:05:05 +0800
3ad11d7ac Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Series of merge handling cleanups (Baolin, Christoph)

- Series of blk-throttle fixes and cleanups (Baolin)

- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)

- Removal of bdget() as a generic API (Christoph)

- Removal of blkdev_get() as a generic API (Christoph)

- Cleanup of is-partition checks (Christoph)

- Series reworking disk revalidation (Christoph)

- Series cleaning up bio flags (Christoph)

- bio crypt fixes (Eric)

- IO stats inflight tweak (Gabriel)

- blk-mq tags fixes (Hannes)

- Buffer invalidation fixes (Jan)

- Allow soft limits for zone append (Johannes)

- Shared tag set improvements (John, Kashyap)

- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

- DM no-wait support (Mike, Konstantin)

- Request allocation improvements (Ming)

- Allow md/dm/bcache to use IO stat helpers (Song)

- Series improving blk-iocost (Tejun)

- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...

Linus Torvalds
2020-10-14 03:12:44 +0800

27 Sep, 2020

1 commit

fb3b36d52 Merge a1bffa48745a ("Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/jejb/scsi") into 'android-mainline'

Fixes up a merge issue in:
net/ipv6/route.c
on the way to a 5.9-rc7 release.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4eb508eb3761b95ad8f39dd79f03b3352481ceaf

Greg Kroah-Hartman
2020-09-27 19:56:43 +0800

25 Sep, 2020

1 commit

3aab91774 block: remove unused BLK_QC_T_EAGAIN flag ... Browse Code »

commit 7b6620d7db56 ("block: remove REQ_NOWAIT_INLINE") removed the
REQ_NOWAIT_INLINE related code, but the diff wasn't applied to
blk_types.h somehow.

Then commit 2771cefeac49 ("block: remove the REQ_NOWAIT_INLINE flag")
removed the REQ_NOWAIT_INLINE flag while the BLK_QC_T_EAGAIN flag still
remains.

Fixes: 7b6620d7db56 ("block: remove REQ_NOWAIT_INLINE")
Signed-off-by: Jeffle Xu
Signed-off-by: Jens Axboe

Jeffle Xu
2020-09-25 21:54:50 +0800

24 Sep, 2020

1 commit

38430f087 block: move the NEED_PART_SCAN flag to struct gendisk ... Browse Code »

We can only scan for partitions on the whole disk, so move the flag
from struct block_device to struct gendisk.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-24 00:43:18 +0800

02 Sep, 2020

5 commits

f4ad06f2b block: rename bd_invalidated ... Browse Code »

Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags
variable to better describe the condition.

Signed-off-by: Christoph Hellwig
Reviewed-by: Josef Bacik
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-02 22:00:02 +0800
46d40cfad block: remove an outdated comment on the bd_dev field ... Browse Code »

kdev_t is long gone, so we don't need to comment a field isn't one..

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-02 06:49:26 +0800
3310eebaf block: remove the BIO_USER_MAPPED flag ... Browse Code »

Just check if there is private data, in which case the bio must have
originated from bio_copy_user_iov.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-02 06:49:26 +0800
f3256075b block: remove the BIO_NULL_MAPPED flag ... Browse Code »

We can simply use a boolean flag in the bio_map_data data structure
instead.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-02 06:49:25 +0800
c2b4bb8cb block: fix locking for struct block_device size updates ... Browse Code »

Two different callers use two different mutexes for updating the
block device size, which obviously doesn't help to actually protect
against concurrent updates from the different callers. In addition
one of the locks, bd_mutex is rather prone to deadlocks with other
parts of the block stack that use it for high level synchronization.

Switch to using a new spinlock protecting just the size updates, as
that is all we need, and make sure everyone does the update through
the proper helper.

This fixes a bug reported with the nvme revalidating disks during a
hot removal operation, which can currently deadlock on bd_mutex.

Reported-by: Xianting Tian
Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-02 06:49:25 +0800

07 Aug, 2020

1 commit

98777a7eb Merge commit 382625d0d432 ("Merge tag 'for-5.9/block-20200802' of git://git.kern… ... Browse Code »

…el.dk/linux-block") into android-mainline

Conflicts:
drivers/md/dm-bow.c
drivers/md/dm-default-key.c
drivers/md/dm.c
fs/crypto/inline_crypt.c

Replace bdev->bd_queue with bdev_get_queue(bdev).

Bug: 129280212
Bug: 160883801
Bug: 160885805
Bug: 162257830
Change-Id: I9b0b295472080dfc0990dcb769205e68d706ce0e
Signed-off-by: Eric Biggers <ebiggers@google.com>

Eric Biggers
2020-08-07 01:07:17 +0800

17 Jul, 2020

1 commit

ecdef9f45 block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers ... Browse Code »

Currently REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL are defined as
even numbers 6 and 8, such zone reset bios are treated as READ bios by
bio_data_dir(), which is obviously misleading.

The macro bio_data_dir() is defined in include/linux/bio.h as,
55 #define bio_data_dir(bio) \
56 (op_is_write(bio_op(bio)) ? WRITE : READ)

And op_is_write() is defined in include/linux/blk_types.h as,
397 static inline bool op_is_write(unsigned int op)
398 {
399 return (op & 1);
400 }

The convention of op_is_write() is when there is data transfer then the
op code should be odd number, and treat as a write op. bio_data_dir()
treats all bio direction as READ if op_is_write() reports false, and
WRITE if op_is_write() reports true.

Because REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL are even numbers,
although they don't transfer data but reporting them as READ bio by
bio_data_dir() is misleading and might be wrong. Because these two
commands will reset the writer pointers of the resetting zones, and all
content after the reset write pointer will be invalid and unaccessible,
obviously they are not READ bios in any means.

This patch changes REQ_OP_ZONE_RESET from 6 to 15, and changes
REQ_OP_ZONE_RESET_ALL from 8 to 17. Now bios with these two op code
can be treated as WRITE by bio_data_dir(). Although they don't transfer
data, now we keep them consistent with REQ_OP_DISCARD and
REQ_OP_WRITE_ZEROES with the ituition that they change on-media content
and should be WRITE request.

Signed-off-by: Coly Li
Reviewed-by: Damien Le Moal
Reviewed-by: Chaitanya Kulkarni
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Jens Axboe
Cc: Johannes Thumshirn
Cc: Keith Busch
Cc: Shaun Tancheff
Signed-off-by: Jens Axboe

Coly Li
2020-07-17 21:15:10 +0800

01 Jul, 2020

4 commits

1008fe6dc block: remove the all_bdevs list ... Browse Code »

Instead just iterate over the inodes for the block device superblock.

Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-01 22:08:25 +0800
47b5e0032 block: remove the unused bd_private field from struct block_device ... Browse Code »

Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-01 22:08:23 +0800
e556f6ba1 block: remove the bd_queue field from struct block_device ... Browse Code »

Just use bd_disk->queue instead.

Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-01 22:08:20 +0800
6b7b181b6 block: remove the bd_block_size field from struct block_device ... Browse Code »

We can trivially calculate the block size from the inodes i_blkbits
variable. Use that instead of keeping two redundant copies of the
information in slightly different formats.

Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-01 22:08:17 +0800

24 Jun, 2020

1 commit

621c1f429 block: move struct block_device to blk_types.h ... Browse Code »

Move the struct block_device definition together with most of the
block layer definitions, as it has nothing to do with the rest of fs.h.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-24 23:16:02 +0800

18 Jun, 2020

1 commit

665ca1bf8 Merge commit 750a02ab8d3c ("Merge tag 'for-5.8/block-2020-06-01' of git://git.ke… ... Browse Code »

…rnel.dk/linux-block") into android-mainline

Conflicts:
block/blk-core.c
block/blk-crypto-fallback.c
block/blk-crypto.c
block/keyslot-manager.c
drivers/md/dm.c
include/linux/blk-crypto.h
include/linux/blk_types.h
include/linux/keyslot-manager.h

Change-Id: Ie757c41fa41e6a9aacdf123d82d4f681623a02a8
Signed-off-by: Eric Biggers <ebiggers@google.com>

Eric Biggers
2020-06-18 08:18:48 +0800

17 May, 2020

1 commit

2771cefea block: remove the REQ_NOWAIT_INLINE flag ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-17 04:23:54 +0800

14 May, 2020

1 commit

a892c8d52 block: Inline encryption support for blk-mq ... Browse Code »

We must have some way of letting a storage device driver know what
encryption context it should use for en/decrypting a request. However,
it's the upper layers (like the filesystem/fscrypt) that know about and
manages encryption contexts. As such, when the upper layer submits a bio
to the block layer, and this bio eventually reaches a device driver with
support for inline encryption, the device driver will need to have been
told the encryption context for that bio.

We want to communicate the encryption context from the upper layer to the
storage device along with the bio, when the bio is submitted to the block
layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
represent an encryption context (note that we can't use the bi_private
field in struct bio to do this because that field does not function to pass
information across layers in the storage stack). We also introduce various
functions to manipulate the bio_crypt_ctx and make the bio/request merging
logic aware of the bio_crypt_ctx.

We also make changes to blk-mq to make it handle bios with encryption
contexts. blk-mq can merge many bios into the same request. These bios need
to have contiguous data unit numbers (the necessary changes to blk-merge
are also made to ensure this) - as such, it suffices to keep the data unit
number of just the first bio, since that's all a storage driver needs to
infer the data unit number to use for each data block in each bio in a
request. blk-mq keeps track of the encryption context to be used for all
the bios in a request with the request's rq_crypt_ctx. When the first bio
is added to an empty request, blk-mq will program the encryption context
of that bio into the request_queue's keyslot manager, and store the
returned keyslot in the request's rq_crypt_ctx. All the functions to
operate on encryption contexts are in blk-crypto.c.

Upper layers only need to call bio_crypt_set_ctx with the encryption key,
algorithm and data_unit_num; they don't have to worry about getting a
keyslot for each encryption context, as blk-mq/blk-crypto handles that.
Blk-crypto also makes it possible for request-based layered devices like
dm-rq to make use of inline encryption hardware by cloning the
rq_crypt_ctx and programming a keyslot in the new request_queue when
necessary.

Note that any user of the block layer can submit bios with an
encryption context, such as filesystems, device-mapper targets, etc.

Signed-off-by: Satya Tangirala
Reviewed-by: Eric Biggers
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Satya Tangirala
2020-05-14 23:47:53 +0800

13 May, 2020

1 commit

0512a75b9 block: Introduce REQ_OP_ZONE_APPEND ... Browse Code »

Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned
block device. This is a no-merge write operation.

A zone append write BIO must:
* Target a zoned block device
* Have a sector position indicating the start sector of the target zone
* The target zone must be a sequential write zone
* The BIO must not cross a zone boundary
* The BIO size must not be split to ensure that a single range of LBAs
is written with a single command.

Implement these checks in generic_make_request_checks() using the
helper function blk_check_zone_append(). To avoid write append BIO
splitting, introduce the new max_zone_append_sectors queue limit
attribute and ensure that a BIO size is always lower than this limit.
Export this new limit through sysfs and check these limits in bio_full().

Also when a LLDD can't dispatch a request to a specific zone, it
will return BLK_STS_ZONE_RESOURCE indicating this request needs to
be delayed, e.g. because the zone it will be dispatched to is still
write-locked. If this happens set the request aside in a local list
to continue trying dispatching requests such as READ requests or a
WRITE/ZONE_APPEND requests targetting other zones. This way we can
still keep a high queue depth without starving other requests even if
one request can't be served due to zone write-locking.

Finally, make sure that the bio sector position indicates the actual
write position as indicated by the device on completion.

Signed-off-by: Keith Busch
[ jth: added zone-append specific add_page and merge_page helpers ]
Signed-off-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Keith Busch
2020-05-13 10:36:28 +0800

29 Apr, 2020

1 commit

0376e9efe block: replace BIO_QUEUE_ENTERED with BIO_CGROUP_ACCT ... Browse Code »

BIO_QUEUE_ENTERED is only used for cgroup accounting now, so rename
the flag and move setting it into the cgroup code.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-04-29 23:33:26 +0800

20 Apr, 2020

1 commit

1d70bdbbe Merge 5.7-rc2 into android-mainline ... Browse Code »

Linux 5.7-rc2

Signed-off-by: Greg Kroah-Hartman
Change-Id: Ibe207020828f0cbfb59b66ade8c79760bcab2191

Greg Kroah-Hartman
2020-04-20 16:47:28 +0800

19 Apr, 2020

1 commit

5a58ec8cf blk_types: Replace zero-length array with flexible-array member ... Browse Code »

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-04-19 04:44:54 +0800

25 Jan, 2020

1 commit

cb39ec0c1 ANDROID: dm: add dm-default-key target for metadata encryption ... Browse Code »

Add a device-mapper target "dm-default-key" which assigns an encryption
key to bios that aren't for the contents of an encrypted file.

This ensures that all blocks on-disk will be encrypted with some key,
without the performance hit of file contents being encrypted twice when
fscrypt (File-Based Encryption) is used.

It is only appropriate to use dm-default-key when key configuration is
tightly controlled, like it is in Android, such that all fscrypt keys
are at least as hard to compromise as the default key.

Compared to the original version of dm-default-key, this has been
modified to use the new vendor-independent inline encryption framework
(which works even when no inline encryption hardware is present), the
table syntax has been changed to match dm-crypt, and support for
specifying Adiantum encryption has been added. These changes also mean
that dm-default-key now always explicitly specifies the DUN (the IV).

Also, to handle f2fs moving blocks of encrypted files around without the
key, and to handle ext4 and f2fs filesystems mounted without
'-o inlinecrypt', the mapping logic is no longer "set a key on the bio
if it doesn't have one already", but rather "set a key on the bio unless
the bio has the bi_skip_dm_default_key flag set". Filesystems set this
flag on *all* bios for encrypted file contents, regardless of whether
they are encrypting/decrypting the file using inline encryption or the
traditional filesystem-layer encryption, or moving the raw data.

For the bi_skip_dm_default_key flag, a new field in struct bio is used
rather than a bit in bi_opf so that fscrypt_set_bio_crypt_ctx() can set
the flag, minimizing the changes needed to filesystems. (bi_opf is
usually overwritten after fscrypt_set_bio_crypt_ctx() is called.)

Bug: 137270441
Bug: 147814592
Change-Id: I69c9cd1e968ccf990e4ad96e5115b662237f5095
Signed-off-by: Eric Biggers

Eric Biggers
2020-01-25 02:53:45 +0800

09 Dec, 2019

1 commit

d3a196a37 Merge 5.5-rc1 into android-mainline ... Browse Code »

Linux 5.5-rc1

Signed-off-by: Greg Kroah-Hartman
Change-Id: I6f952ebdd40746115165a2f99bab340482f5c237

Greg Kroah-Hartman
2019-12-09 19:12:00 +0800

22 Nov, 2019

1 commit

b68663186 block: add iostat counters for flush requests ... Browse Code »

Requests that triggers flushing volatile writeback cache to disk (barriers)
have significant effect to overall performance.

Block layer has sophisticated engine for combining several flush requests
into one. But there is no statistics for actual flushes executed by disk.
Requests which trigger flushes usually are barriers - zero-size writes.

This patch adds two iostat counters into /sys/class/block/$dev/stat and
/proc/diskstats - count of completed flush requests and their total time.

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Konstantin Khlebnikov
2019-11-22 00:06:47 +0800

07 Nov, 2019

1 commit

6c1b1da58 block: add zone open, close and finish operations ... Browse Code »

Zoned block devices (ZBC and ZAC devices) allow an explicit control
over the condition (state) of zones. The operations allowed are:
* Open a zone: Transition to open condition to indicate that a zone will
actively be written
* Close a zone: Transition to closed condition to release the drive
resources used for writing to a zone
* Finish a zone: Transition an open or closed zone to the full
condition to prevent write operations

To enable this control for in-kernel zoned block device users, define
the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
and REQ_OP_ZONE_FINISH as well as the generic function
blkdev_zone_mgmt() for submitting these operations on a range of zones.
This results in blkdev_reset_zones() removal and replacement with this
new zone magement function. Users of blkdev_reset_zones() (f2fs and
dm-zoned) are updated accordingly.

Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.

Reviewed-by: Javier González
Reviewed-by: Christoph Hellwig
Signed-off-by: Ajay Joshi
Signed-off-by: Matias Bjorling
Signed-off-by: Hans Holmberg
Signed-off-by: Dmitry Fomichev
Signed-off-by: Keith Busch
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Ajay Joshi
2019-11-07 21:31:48 +0800

31 Oct, 2019

1 commit

10f1b43cb FROMLIST: block: Add encryption context to struct bio ... Browse Code »

We must have some way of letting a storage device driver know what
encryption context it should use for en/decrypting a request. However,
it's the filesystem/fscrypt that knows about and manages encryption
contexts. As such, when the filesystem layer submits a bio to the block
layer, and this bio eventually reaches a device driver with support for
inline encryption, the device driver will need to have been told the
encryption context for that bio.

We want to communicate the encryption context from the filesystem layer
to the storage device along with the bio, when the bio is submitted to the
block layer. To do this, we add a struct bio_crypt_ctx to struct bio, which
can represent an encryption context (note that we can't use the bi_private
field in struct bio to do this because that field does not function to pass
information across layers in the storage stack). We also introduce various
functions to manipulate the bio_crypt_ctx and make the bio/request merging
logic aware of the bio_crypt_ctx.

Bug: 137270441
Test: tested as series; see Ie1b77f7615d6a7a60fdc9105c7ab2200d17636a8
Change-Id: I479de9ec13758f1978b34d897e6956e680caeb92
Signed-off-by: Satya Tangirala
Link: https://patchwork.kernel.org/patch/11214719/

Satya Tangirala
2019-10-31 05:14:59 +0800

26 Oct, 2019

1 commit

993e4cdeb block: reorder bio::__bi_remaining for better packing ... Browse Code »

Simple reordering of __bi_remaining can reduce bio size by 8 bytes that
are now wasted on padding (measured on x86_64):

struct bio {
struct bio * bi_next; /* 0 8 */
struct gendisk * bi_disk; /* 8 8 */
unsigned int bi_opf; /* 16 4 */
short unsigned int bi_flags; /* 20 2 */
short unsigned int bi_ioprio; /* 22 2 */
short unsigned int bi_write_hint; /* 24 2 */
blk_status_t bi_status; /* 26 1 */
u8 bi_partno; /* 27 1 */

/* XXX 4 bytes hole, try to pack */

struct bvec_iter bi_iter; /* 32 24 */

/* XXX last struct has 4 bytes of padding */

atomic_t __bi_remaining; /* 56 4 */

/* XXX 4 bytes hole, try to pack */
[...]
/* size: 104, cachelines: 2, members: 19 */
/* sum members: 96, holes: 2, sum holes: 8 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 40 bytes */
};

Now becomes:

struct bio {
struct bio * bi_next; /* 0 8 */
struct gendisk * bi_disk; /* 8 8 */
unsigned int bi_opf; /* 16 4 */
short unsigned int bi_flags; /* 20 2 */
short unsigned int bi_ioprio; /* 22 2 */
short unsigned int bi_write_hint; /* 24 2 */
blk_status_t bi_status; /* 26 1 */
u8 bi_partno; /* 27 1 */
atomic_t __bi_remaining; /* 28 4 */
struct bvec_iter bi_iter; /* 32 24 */

/* XXX last struct has 4 bytes of padding */
[...]
/* size: 96, cachelines: 2, members: 19 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 32 bytes */
};

Signed-off-by: David Sterba
Signed-off-by: Jens Axboe

David Sterba
2019-10-26 04:12:50 +0800

29 Aug, 2019

1 commit

7caa47151 blkcg: implement blk-iocost ... Browse Code »

This patchset implements IO cost model based work-conserving
proportional controller.

While io.latency provides the capability to comprehensively prioritize
and protect IOs depending on the cgroups, its protection is binary -
the lowest latency target cgroup which is suffering is protected at
the cost of all others. In many use cases including stacking multiple
workload containers in a single system, it's necessary to distribute
IO capacity with better granularity.

One challenge of controlling IO resources is the lack of trivially
observable cost metric. The most common metrics - bandwidth and iops
- can be off by orders of magnitude depending on the device type and
IO pattern. However, the cost isn't a complete mystery. Given
several key attributes, we can make fairly reliable predictions on how
expensive a given stream of IOs would be, at least compared to other
IO patterns.

The function which determines the cost of a given IO is the IO cost
model for the device. This controller distributes IO capacity based
on the costs estimated by such model. The more accurate the cost
model the better but the controller adapts based on IO completion
latency and as long as the relative costs across differents IO
patterns are consistent and sensible, it'll adapt to the actual
performance of the device.

Currently, the only implemented cost model is a simple linear one with
a few sets of default parameters for different classes of device.
This covers most common devices reasonably well. All the
infrastructure to tune and add different cost models is already in
place and a later patch will also allow using bpf progs for cost
models.

Please see the top comment in blk-iocost.c and documentation for
more details.

v2: Rebased on top of RQ_ALLOC_TIME changes and folded in Rik's fix
for a divide-by-zero bug in current_hweight() triggered by zero
inuse_sum.

Signed-off-by: Tejun Heo
Cc: Andy Newell
Cc: Josef Bacik
Cc: Rik van Riel
Signed-off-by: Jens Axboe

Tejun Heo
2019-08-29 11:17:12 +0800

14 Aug, 2019

1 commit

b8e24a930 block: annotate refault stalls from IO submission ... Browse Code »

psi tracks the time tasks wait for refaulting pages to become
uptodate, but it does not track the time spent submitting the IO. The
submission part can be significant if backing storage is contended or
when cgroup throttling (io.latency) is in effect - a lot of time is
spent in submit_bio(). In that case, we underreport memory pressure.

Annotate submit_bio() to account submission time as memory stall when
the bio is reading userspace workingset pages.

Tested-by: Suren Baghdasaryan
Signed-off-by: Johannes Weiner
Signed-off-by: Jens Axboe

Johannes Weiner
2019-08-14 22:50:01 +0800

05 Aug, 2019

1 commit

e84e8f066 block: add req op to reset all zones and flag ... Browse Code »

This patch introduces a new request operation REQ_OP_ZONE_RESET_ALL.
This is useful for the applications like mkfs where it needs to reset
all the zones present on the underlying block device. As part for this
patch we also introduce new QUEUE_FLAG_ZONE_RESETALL which indicates the
queue zone reset all capability and corresponding helper macro.

Reviewed-by: Damien Le Moal
Reviewed-by: Hannes Reinecke
Signed-off-by: Chaitanya Kulkarni
Signed-off-by: Jens Axboe

Chaitanya Kulkarni
2019-08-05 11:41:29 +0800

22 Jul, 2019

1 commit

893a1c972 blk-mq: allow REQ_NOWAIT to return an error inline ... Browse Code »

By default, if a caller sets REQ_NOWAIT and we need to block, we'll
return -EAGAIN through the bio->bi_end_io() callback. For some use
cases, this makes it hard to use.

Allow a caller to ask for inline return of errors related to
blocking by also setting REQ_NOWAIT_INLINE.

Signed-off-by: Jens Axboe

Jens Axboe
2019-07-22 11:46:23 +0800

10 Jul, 2019

1 commit

d3f77dfdc blkcg: implement REQ_CGROUP_PUNT ... Browse Code »

When a shared kthread needs to issue a bio for a cgroup, doing so
synchronously can lead to priority inversions as the kthread can be
trapped waiting for that cgroup. This patch implements
REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing
to a dedicated per-blkcg work item to avoid such priority inversions.

This will be used to fix priority inversions in btrfs compression and
should be generally useful as we grow filesystem support for
comprehensive IO control.

Cc: Chris Mason
Reviewed-by: Josef Bacik
Reviewed-by: Jan Kara
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2019-07-10 23:00:57 +0800

21 Jun, 2019

1 commit

14ccb66b3 block: remove the bi_phys_segments field in struct bio ... Browse Code »

We only need the number of segments in the blk-mq submission path.
Remove the field from struct bio, and return it from a variant of
blk_queue_split instead of that it can passed as an argument to
those functions that need the value.

This also means we stop recounting segments except for cloning
and partial segments.

To keep the number of arguments in this how path down remove
pointless struct request_queue arguments from any of the functions
that had it and grew a nr_segs argument.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-06-21 00:29:22 +0800

24 May, 2019

1 commit

6869875fb block: remove the bi_seg_{front,back}_size fields in struct bio ... Browse Code »

At this point these fields aren't used for anything, so we can remove
them.

Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-24 00:25:26 +0800