Eric Lee / smarc-fsl-linux-kernel

08 May, 2019

1 commit

67a242223 Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"Nothing major in this series, just fixes and improvements all over the
map. This contains:

- Series of fixes for sed-opal (David, Jonas)

- Fixes and performance tweaks for BFQ (via Paolo)

- Set of fixes for bcache (via Coly)

- Set of fixes for md (via Song)

- Enabling multi-page for passthrough requests (Ming)

- Queue release fix series (Ming)

- Device notification improvements (Martin)

- Propagate underlying device rotational status in loop (Holger)

- Removal of mtip32xx trim support, which has been disabled for years
(Christoph)

- Improvement and cleanup of nvme command handling (Christoph)

- Add block SPDX tags (Christoph)

- Cleanup/hardening of bio/bvec iteration (Christoph)

- A few NVMe pull requests (Christoph)

- Removal of CONFIG_LBDAF (Christoph)

- Various little fixes here and there"

* tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block: (164 commits)
block: fix mismerge in bvec_advance
block: don't drain in-progress dispatch in blk_cleanup_queue()
blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release
blk-mq: always free hctx after request queue is freed
blk-mq: split blk_mq_alloc_and_init_hctx into two parts
blk-mq: free hw queue's resource in hctx's release handler
blk-mq: move cancel of requeue_work into blk_mq_release
blk-mq: grab .q_usage_counter when queuing request from plug code path
block: fix function name in comment
nvmet: protect discovery change log event list iteration
nvme: mark nvme_core_init and nvme_core_exit static
nvme: move command size checks to the core
nvme-fabrics: check more command sizes
nvme-pci: check more command sizes
nvme-pci: remove an unneeded variable initialization
nvme-pci: unquiesce admin queue on shutdown
nvme-pci: shutdown on timeout during deletion
nvme-pci: fix psdt field for single segment sgls
nvme-multipath: don't print ANA group state by default
nvme-multipath: split bios with the ns_head bio_set before submitting
...

Linus Torvalds
2019-05-08 09:14:36 +0800

07 May, 2019

1 commit

81ff5d2cb Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto update from Herbert Xu:
"API:
- Add support for AEAD in simd
- Add fuzz testing to testmgr
- Add panic_on_fail module parameter to testmgr
- Use per-CPU struct instead multiple variables in scompress
- Change verify API for akcipher

Algorithms:
- Convert x86 AEAD algorithms over to simd
- Forbid 2-key 3DES in FIPS mode
- Add EC-RDSA (GOST 34.10) algorithm

Drivers:
- Set output IV with ctr-aes in crypto4xx
- Set output IV in rockchip
- Fix potential length overflow with hashing in sun4i-ss
- Fix computation error with ctr in vmx
- Add SM4 protected keys support in ccree
- Remove long-broken mxc-scc driver
- Add rfc4106(gcm(aes)) cipher support in cavium/nitrox"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits)
crypto: ccree - use a proper le32 type for le32 val
crypto: ccree - remove set but not used variable 'du_size'
crypto: ccree - Make cc_sec_disable static
crypto: ccree - fix spelling mistake "protedcted" -> "protected"
crypto: caam/qi2 - generate hash keys in-place
crypto: caam/qi2 - fix DMA mapping of stack memory
crypto: caam/qi2 - fix zero-length buffer DMA mapping
crypto: stm32/cryp - update to return iv_out
crypto: stm32/cryp - remove request mutex protection
crypto: stm32/cryp - add weak key check for DES
crypto: atmel - remove set but not used variable 'alg_name'
crypto: picoxcell - Use dev_get_drvdata()
crypto: crypto4xx - get rid of redundant using_sd variable
crypto: crypto4xx - use sync skcipher for fallback
crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues
crypto: crypto4xx - fix ctr-aes missing output IV
crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA
crypto: ux500 - use ccflags-y instead of CFLAGS_.o
crypto: ccree - handle tee fips error during power management resume
crypto: ccree - add function to handle cryptocell tee fips error
...

Linus Torvalds
2019-05-07 11:15:06 +0800

01 May, 2019

1 commit

2d5abb9a1 bcache: make is_discard_enabled() static ... Browse Code »

It's not used outside this file.

Fixes: 631207314d88 ("bcache: fix failure in journal relplay")
Signed-off-by: Jens Axboe

Jens Axboe
2019-05-01 20:34:09 +0800

30 Apr, 2019

3 commits

2b070cfe5 block: remove the i argument to bio_for_each_segment_all ... Browse Code »

We only have two callers that need the integer loop iterator, and they
can easily maintain it themselves.

Suggested-by: Matthew Wilcox
Reviewed-by: Johannes Thumshirn
Acked-by: David Sterba
Reviewed-by: Hannes Reinecke
Acked-by: Coly Li
Reviewed-by: Matthew Wilcox
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-04-30 23:26:13 +0800
f936b06ae bcache: clean up do_btree_node_write a bit ... Browse Code »

Use a variable containing the buffer address instead of the to be
removed integer iterator from bio_for_each_segment_all.

Suggested-by: Matthew Wilcox
Reviewed-by: Hannes Reinecke
Acked-by: Coly Li
Reviewed-by: Matthew Wilcox
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-04-30 23:26:11 +0800
cdca22bcb bcache: remove redundant LIST_HEAD(journal) from run_cache_set() ... Browse Code »

Commit 95f18c9d1310 ("bcache: avoid potential memleak of list of
journal_replay(s) in the CACHE_SYNC branch of run_cache_set") forgets
to remove the original define of LIST_HEAD(journal), which makes
the change no take effect. This patch removes redundant variable
LIST_HEAD(journal) from run_cache_set(), to make Shenghui's fix
working.

Fixes: 95f18c9d1310 ("bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC branch of run_cache_set")
Reported-by: Juha Aatrokoski
Cc: Shenghui Wang
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Coly Li
2019-04-30 22:20:46 +0800

29 Apr, 2019

2 commits

be9c52ed8 dm persistent data: Simplify stack trace handling ... Browse Code »

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface. This results in less storage space and
indirection.

Signed-off-by: Thomas Gleixner
Reviewed-by: Josh Poimboeuf
Cc: Andy Lutomirski
Cc: dm-devel@redhat.com
Cc: Mike Snitzer
Cc: Alasdair Kergon
Cc: Steven Rostedt
Cc: Alexander Potapenko
Cc: Alexey Dobriyan
Cc: Andrew Morton
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: linux-mm@kvack.org
Cc: David Rientjes
Cc: Catalin Marinas
Cc: Dmitry Vyukov
Cc: Andrey Ryabinin
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport
Cc: Akinobu Mita
Cc: Christoph Hellwig
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy
Cc: Marek Szyprowski
Cc: Johannes Thumshirn
Cc: David Sterba
Cc: Chris Mason
Cc: Josef Bacik
Cc: linux-btrfs@vger.kernel.org
Cc: Daniel Vetter
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen
Cc: Maarten Lankhorst
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie
Cc: Jani Nikula
Cc: Rodrigo Vivi
Cc: Tom Zanussi
Cc: Miroslav Benes
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094802.533968922@linutronix.de

Thomas Gleixner
2019-04-29 18:37:52 +0800
741b58f3e dm bufio: Simplify stack trace retrieval ... Browse Code »

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface.

Signed-off-by: Thomas Gleixner
Reviewed-by: Josh Poimboeuf
Cc: Andy Lutomirski
Cc: dm-devel@redhat.com
Cc: Mike Snitzer
Cc: Alasdair Kergon
Cc: Steven Rostedt
Cc: Alexander Potapenko
Cc: Alexey Dobriyan
Cc: Andrew Morton
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: linux-mm@kvack.org
Cc: David Rientjes
Cc: Catalin Marinas
Cc: Dmitry Vyukov
Cc: Andrey Ryabinin
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport
Cc: Akinobu Mita
Cc: Christoph Hellwig
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy
Cc: Marek Szyprowski
Cc: Johannes Thumshirn
Cc: David Sterba
Cc: Chris Mason
Cc: Josef Bacik
Cc: linux-btrfs@vger.kernel.org
Cc: Daniel Vetter
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen
Cc: Maarten Lankhorst
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie
Cc: Jani Nikula
Cc: Rodrigo Vivi
Cc: Tom Zanussi
Cc: Miroslav Benes
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094802.446326191@linutronix.de

Thomas Gleixner
2019-04-29 18:37:52 +0800

25 Apr, 2019

19 commits

877b5691f crypto: shash - remove shash_desc::flags ... Browse Code »

The flags field in 'struct shash_desc' never actually does anything.
The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
However, no shash algorithm ever sleeps, making this flag a no-op.

With this being the case, inevitably some users who can't sleep wrongly
pass MAY_SLEEP. These would all need to be fixed if any shash algorithm
actually started sleeping. For example, the shash_ahash_*() functions,
which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
from the ahash API to the shash API. However, the shash functions are
called under kmap_atomic(), so actually they're assumed to never sleep.

Even if it turns out that some users do need preemption points while
hashing large buffers, we could easily provide a helper function
crypto_shash_update_large() which divides the data into smaller chunks
and calls crypto_shash_update() and cond_resched() for each chunk. It's
not necessary to have a flag in 'struct shash_desc', nor is it necessary
to make individual shash algorithms aware of this at all.

Therefore, remove shash_desc::flags, and document that the
crypto_shash_*() functions can be called from any context.

Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu

Eric Biggers
2019-04-25 15:38:12 +0800
95f18c9d1 bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC b… ... Browse Code »

…ranch of run_cache_set

In the CACHE_SYNC branch of run_cache_set(), LIST_HEAD(journal) is used
to collect journal_replay(s) and filled by bch_journal_read().

If all goes well, bch_journal_replay() will release the list of
jounal_replay(s) at the end of the branch.

If something goes wrong, code flow will jump to the label "err:" and leave
the list unreleased.

This patch will release the list of journal_replay(s) in the case of
error detected.

v1 -> v2:
* Move the release code to the location after label 'err:' to
simply the change.

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Shenghui Wang
2019-04-25 00:56:29 +0800
f16277ca2 bcache: fix wrong usage use-after-freed on keylist in out_nocoalesce branch of btree_gc_coalesce ... Browse Code »

Elements of keylist should be accessed before the list is freed.
Move bch_keylist_free() calling after the while loop to avoid wrong
content accessed.

Signed-off-by: Shenghui Wang
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Shenghui Wang
2019-04-25 00:56:29 +0800
631207314 bcache: fix failure in journal relplay ... Browse Code »

journal replay failed with messages:
Sep 10 19:10:43 ceph kernel: bcache: error on
bb379a64-e44e-4812-b91d-a5599871a3b1: bcache: journal entries
2057493-2057567 missing! (replaying 2057493-2076601), disabling
caching

The reason is in journal_reclaim(), when discard is enabled, we send
discard command and reclaim those journal buckets whose seq is old
than the last_seq_now, but before we write a journal with last_seq_now,
the machine is restarted, so the journal with the last_seq_now is not
written to the journal bucket, and the last_seq_wrote in the newest
journal is old than last_seq_now which we expect to be, so when we doing
replay, journals from last_seq_wrote to last_seq_now are missing.

It's hard to write a journal immediately after journal_reclaim(),
and it harmless if those missed journal are caused by discarding
since those journals are already wrote to btree node. So, if miss
seqs are started from the beginning journal, we treat it as normal,
and only print a message to show the miss journal, and point out
it maybe caused by discarding.

Patch v2 add a judgement condition to ignore the missed journal
only when discard enabled as Coly suggested.

(Coly Li: rebase the patch with other changes in bch_journal_replay())

Signed-off-by: Tang Junhui
Tested-by: Dennis Schridde
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Tang Junhui
2019-04-25 00:56:28 +0800
eb8cbb6df bcache: improve bcache_reboot() ... Browse Code »

This patch tries to release mutex bch_register_lock early, to give
chance to stop cache set and bcache device early.

This patch also expends time out of stopping all bcache device from
2 seconds to 10 seconds, because stopping writeback rate update worker
may delay for 5 seconds, 2 seconds is not enough.

After this patch applied, stopping bcache devices during system reboot
or shutdown is very hard to be observed any more.

Signed-off-by: Coly Li
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Coly Li
2019-04-25 00:56:28 +0800
63d63b51d bcache: add comments for closure_fn to be called in closure_queue() ... Browse Code »

Add code comments to explain which call back function might be called
for the closure_queue(). This is an effort to make code to be more
understandable for readers.

Signed-off-by: Coly Li
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Coly Li
2019-04-25 00:56:28 +0800
bb6d355c2 bcache: Add comments for blkdev_put() in registration code path ... Browse Code »

Add comments to explain why in register_bcache() blkdev_put() won't
be called in two location. Add comments to explain why blkdev_put()
must be called in register_cache() when cache_alloc() failed.

Signed-off-by: Coly Li
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Coly Li
2019-04-25 00:56:28 +0800
88c12d42d bcache: add error check for calling register_bdev() ... Browse Code »

This patch adds return value to register_bdev(). Then if failure happens
inside register_bdev(), its caller register_bcache() may detect and
handle the failure more properly.

Signed-off-by: Coly Li
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Coly Li
2019-04-25 00:56:28 +0800
68d10e697 bcache: return error immediately in bch_journal_replay() ... Browse Code »

When failure happens inside bch_journal_replay(), calling
cache_set_err_on() and handling the failure in async way is not a good
idea. Because after bch_journal_replay() returns, registering code will
continue to execute following steps, and unregistering code triggered
by cache_set_err_on() is running in same time. First it is unnecessary
to handle failure and unregister cache set in an async way, second there
might be potential race condition to run register and unregister code
for same cache set.

So in this patch, if failure happens in bch_journal_replay(), we don't
call cache_set_err_on(), and just print out the same error message to
kernel message buffer, then return -EIO immediately caller. Then caller
can detect such failure and handle it in synchrnozied way.

Signed-off-by: Coly Li
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Coly Li
2019-04-25 00:56:28 +0800
2d17456eb bcache: add comments for kobj release callback routine ... Browse Code »

Bcache has several routines to release resources in implicit way, they
are called when the associated kobj released. This patch adds code
comments to notice when and which release callback will be called,
- When dc->disk.kobj released:
void bch_cached_dev_release(struct kobject *kobj)
- When d->kobj released:
void bch_flash_dev_release(struct kobject *kobj)
- When c->kobj released:
void bch_cache_set_release(struct kobject *kobj)
- When ca->kobj released
void bch_cache_release(struct kobject *kobj)

Signed-off-by: Coly Li
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Coly Li
2019-04-25 00:56:28 +0800
ce3e4cfb5 bcache: add failure check to run_cache_set() for journal replay ... Browse Code »

Currently run_cache_set() has no return value, if there is failure in
bch_journal_replay(), the caller of run_cache_set() has no idea about
such failure and just continue to execute following code after
run_cache_set(). The internal failure is triggered inside
bch_journal_replay() and being handled in async way. This behavior is
inefficient, while failure handling inside bch_journal_replay(), cache
register code is still running to start the cache set. Registering and
unregistering code running as same time may introduce some rare race
condition, and make the code to be more hard to be understood.

This patch adds return value to run_cache_set(), and returns -EIO if
bch_journal_rreplay() fails. Then caller of run_cache_set() may detect
such failure and stop registering code flow immedidately inside
register_cache_set().

If journal replay fails, run_cache_set() can report error immediately
to register_cache_set(). This patch makes the failure handling for
bch_journal_replay() be in synchronized way, easier to understand and
debug, and avoid poetential race condition for register-and-unregister
in same time.

Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Coly Li
2019-04-25 00:56:28 +0800
1bee2addc bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim() ... Browse Code »

In journal_reclaim() ja->cur_idx of each cache will be update to
reclaim available journal buckets. Variable 'int n' is used to count how
many cache is successfully reclaimed, then n is set to c->journal.key
by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache()
loop will write the jset data onto each cache.

The problem is, if all jouranl buckets on each cache is full, the
following code in journal_reclaim(),

529 for_each_cache(ca, c, iter) {
530 struct journal_device *ja = &ca->journal;
531 unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets;
532
533 /* No space available on this device */
534 if (next == ja->discard_idx)
535 continue;
536
537 ja->cur_idx = next;
538 k->ptr[n++] = MAKE_PTR(0,
539 bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
540 ca->sb.nr_this_dev);
541 }
542
543 bkey_init(k);
544 SET_KEY_PTRS(k, n);

If there is no available bucket to reclaim, the if() condition at line
534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS()
will set KEY_PTRS field of c->journal.key to 0.

Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in
journal_write_unlocked() the journal data is written in following loop,

649 for (i = 0; i < KEY_PTRS(k); i++) {
650-671 submit journal data to cache device
672 }

If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data
won't be written to cache device here. If system crahed or rebooted
before bkeys of the lost journal entries written into btree nodes, data
corruption will be reported during bcache reload after rebooting the
system.

Indeed there is only one cache in a cache set, there is no need to set
KEY_PTRS field in journal_reclaim() at all. But in order to keep the
for_each_cache() logic consistent for now, this patch fixes the above
problem by not setting 0 KEY_PTRS of journal key, if there is no bucket
available to reclaim.

Signed-off-by: Coly Li
Reviewed-by: Hannes Reinecke
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Coly Li
2019-04-25 00:56:27 +0800
14215ee01 bcache: move definition of 'int ret' out of macro read_bucket() ... Browse Code »

'int ret' is defined as a local variable inside macro read_bucket().
Since this macro is called multiple times, and following patches will
use a 'int ret' variable in bch_journal_read(), this patch moves
definition of 'int ret' from macro read_bucket() to range of function
bch_journal_read().

Signed-off-by: Coly Li
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Coly Li
2019-04-25 00:56:27 +0800
a4b732a24 bcache: fix a race between cache register and cacheset unregister ... Browse Code »

There is a race between cache device register and cache set unregister.
For an already registered cache device, register_bcache will call
bch_is_open to iterate through all cachesets and check every cache
there. The race occurs if cache_set_free executes at the same time and
clears the caches right before ca is dereferenced in bch_is_open_cache.
To close the race, let's make sure the clean up work is protected by
the bch_register_lock as well.

This issue can be reproduced as follows,
while true; do echo /dev/XXX> /sys/fs/bcache/register ; done&
while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done &

and results in the following oops,

[ +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998
[ +0.000457] #PF error: [normal kernel read fault]
[ +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0
[ +0.000388] Oops: 0000 [#1] SMP PTI
[ +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6
[ +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
[ +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache]
[ +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d
[ +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202
[ +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000
[ +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001
[ +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a
[ +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0
[ +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620
[ +0.000384] FS: 00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000
[ +0.000420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0
[ +0.000837] Call Trace:
[ +0.000682] ? _cond_resched+0x10/0x20
[ +0.000691] ? __kmalloc+0x131/0x1b0
[ +0.000710] kernfs_fop_write+0xfa/0x170
[ +0.000733] __vfs_write+0x2e/0x190
[ +0.000688] ? inode_security+0x10/0x30
[ +0.000698] ? selinux_file_permission+0xd2/0x120
[ +0.000752] ? security_file_permission+0x2b/0x100
[ +0.000753] vfs_write+0xa8/0x1a0
[ +0.000676] ksys_write+0x4d/0xb0
[ +0.000699] do_syscall_64+0x3a/0xf0
[ +0.000692] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Liang Chen
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Liang Chen
2019-04-25 00:56:27 +0800
3a3947271 bcache: Clean up bch_get_congested() ... Browse Code »

There are a few nits in this function. They could in theory all
be separate patches, but that's probably taking small commits
too far.

1) I added a brief comment saying what it does.

2) I like to declare pointer parameters "const" where possible
for documentation reasons.

3) It uses bitmap_weight(&rand, BITS_PER_LONG) to compute the Hamming
weight of a 32-bit random number (giving a random integer with
mean 16 and variance 8). Passing by reference in a 64-bit variable
is silly; just use hweight32().

4) Its helper function fract_exp_two is unnecessarily tangled.
Gcc can optimize the multiply by (1 << x) to a shift, but it can
be written in a much more straightforward way at the cost of one
more bit of internal precision. Some analysis reveals that this
bit is always available.

This shrinks the object code for fract_exp_two(x, 6) from 23 bytes:

0000000000000000 :
0: 89 f9 mov %edi,%ecx
2: c1 e9 06 shr $0x6,%ecx
5: b8 01 00 00 00 mov $0x1,%eax
a: d3 e0 shl %cl,%eax
c: 83 e7 3f and $0x3f,%edi
f: d3 e7 shl %cl,%edi
11: c1 ef 06 shr $0x6,%edi
14: 01 f8 add %edi,%eax
16: c3 retq

To 19:

0000000000000017 :
17: 89 f8 mov %edi,%eax
19: 83 e0 3f and $0x3f,%eax
1c: 83 c0 40 add $0x40,%eax
1f: 89 f9 mov %edi,%ecx
21: c1 e9 06 shr $0x6,%ecx
24: d3 e0 shl %cl,%eax
26: c1 e8 06 shr $0x6,%eax
29: c3 retq

(Verified with 0 < 16<
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

George Spelvin
2019-04-25 00:56:27 +0800
792732d98 bcache: use kmemdup_nul for CACHED_LABEL buffer ... Browse Code »

This patch uses kmemdup_nul to create a NUL-terminated string from
dc->sb.label. This is better than open coding it.

With this, we can move env[2] initialization into env[] array to make
code more elegant.

Signed-off-by: Geliang Tang
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Geliang Tang
2019-04-25 00:56:27 +0800
78d4eb8ad bcache: avoid clang -Wunintialized warning ... Browse Code »

clang has identified a code path in which it thinks a
variable may be unused:

drivers/md/bcache/alloc.c:333:4: error: variable 'bucket' is used uninitialized whenever 'if' condition is false
[-Werror,-Wsometimes-uninitialized]
fifo_pop(&ca->free_inc, bucket);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
#define fifo_pop(fifo, i) fifo_pop_front(fifo, (i))
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/md/bcache/util.h:189:6: note: expanded from macro 'fifo_pop_front'
if (_r) { \
^~
drivers/md/bcache/alloc.c:343:46: note: uninitialized use occurs here
allocator_wait(ca, bch_allocator_push(ca, bucket));
^~~~~~
drivers/md/bcache/alloc.c:287:7: note: expanded from macro 'allocator_wait'
if (cond) \
^~~~
drivers/md/bcache/alloc.c:333:4: note: remove the 'if' if its condition is always true
fifo_pop(&ca->free_inc, bucket);
^
drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
#define fifo_pop(fifo, i) fifo_pop_front(fifo, (i))
^
drivers/md/bcache/util.h:189:2: note: expanded from macro 'fifo_pop_front'
if (_r) { \
^
drivers/md/bcache/alloc.c:331:15: note: initialize the variable 'bucket' to silence this warning
long bucket;
^

This cannot happen in practice because we only enter the loop
if there is at least one element in the list.

Slightly rearranging the code makes this clearer to both the
reader and the compiler, which avoids the warning.

Signed-off-by: Arnd Bergmann
Reviewed-by: Nathan Chancellor
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Arnd Bergmann
2019-04-25 00:56:27 +0800
4e0c04ec3 bcache: fix inaccurate result of unused buckets ... Browse Code »

To get the amount of unused buckets in sysfs_priority_stats, the code
count the buckets which GC_SECTORS_USED is zero. It's correct and should
not be overwritten by the count of buckets which prio is zero.

Signed-off-by: Guoju Fang
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Guoju Fang
2019-04-25 00:56:27 +0800
1568ee7e3 bcache: fix crashes stopping bcache device before read miss done ... Browse Code »

The bio from upper layer is considered completed when bio_complete()
returns. In most scenarios bio_complete() is called in search_free(),
but when read miss happens, the bio_compete() is called when backing
device reading completed, while the struct search is still in use until
cache inserting finished.

If someone stops the bcache device just then, the device may be closed
and released, but after cache inserting finished the struct search will
access a freed struct cached_dev.

This patch add the reference of bcache device before bio_complete() when
read miss happens, and put it after the search is not used.

Signed-off-by: Guoju Fang
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Guoju Fang
2019-04-25 00:56:27 +0800

17 Apr, 2019

3 commits

b2176a1df md/raid: raid5 preserve the writeback action after the parity check ... Browse Code »

The problem is that any 'uptodate' vs 'disks' check is not precise
in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
device that might try to kick off writes and then skip the action.
Better to prevent the raid driver from taking unexpected action *and* keep
the system alive vs killing the machine with BUG_ON.

Note: fixed warning reported by kbuild test robot

Signed-off-by: Dan Williams
Signed-off-by: Nigel Croxon
Signed-off-by: Song Liu

Nigel Croxon
2019-04-17 05:33:18 +0800
a25d8c327 Revert "Don't jump to compute_result state from check_result state" ... Browse Code »

This reverts commit 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef.

Cc: Dan Williams
Cc: Nigel Croxon
Cc: Xiao Ni
Signed-off-by: Song Liu

Song Liu
2019-04-17 00:35:23 +0800
c42d32409 md: return -ENODEV if rdev has no mddev assigned ... Browse Code »

Mdadm expects that setting drive as faulty will fail with -EBUSY only if
this operation will cause RAID to be failed. If this happens, it will
try to stop the array. Currently -EBUSY might also be returned if rdev
is in the middle of the removal process - for example there is a race
with mdmon that already requested the drive to be failed/removed.

If rdev does not contain mddev, return -ENODEV instead, so the caller
can distinguish between those two cases and behave accordingly.

Reviewed-by: NeilBrown
Signed-off-by: Pawel Baldysiak
Signed-off-by: Song Liu

Pawel Baldysiak
2019-04-17 00:31:21 +0800

15 Apr, 2019

1 commit

d7ba86675 Merge tag 'v5.1-rc5' into for-5.2/block ... Browse Code »

Pull in v5.1-rc5 to resolve two conflicts. One is in BFQ, in just
a comment, and is trivial. The other one is a conflict due to a
later fix in the bio multi-page work, and needs a bit more care.

* tag 'v5.1-rc5': (476 commits)
Linux 5.1-rc5
fs: prevent page refcount overflow in pipe_buf_get
mm: prevent get_user_pages() from overflowing page refcount
mm: add 'try_get_page()' helper function
mm: make page ref count overflow check tighter and more explicit
clk: imx: Fix PLL_1416X not rounding rates
clk: mediatek: fix clk-gate flag setting
arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
iommu/amd: Set exclusion range correctly
clang-format: Update with the latest for_each macro list
perf/core: Fix perf_event_disable_inatomic() race
block: fix the return errno for direct IO
Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
NFSv4.1 fix incorrect return value in copy_file_range
xprtrdma: Fix helper that drains the transport
NFS: Fix handling of reply page vector
NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
dma-debug: only skip one stackframe entry
platform/x86: pmc_atom: Drop __initconst on dmi table
nvmet: fix discover log page when offsets are used
...

Signed-off-by: Jens Axboe

Jens Axboe
2019-04-15 22:14:19 +0800

11 Apr, 2019

8 commits

efcd487c6 md: add __acquires/__releases annotations to handle_active_stripes ... Browse Code »

This tells sparse that we release and reacquire the device_lock and
avoids a warning.

Signed-off-by: Christoph Hellwig
Signed-off-by: Song Liu

Christoph Hellwig
2019-04-11 06:26:09 +0800
368ecade0 md: add __acquires/__releases annotations to (un)lock_two_stripes ... Browse Code »

This tells sparse that we acquire/release the two stripe locks and
avoids a warning.

Signed-off-by: Christoph Hellwig
Signed-off-by: Song Liu

Christoph Hellwig
2019-04-11 06:26:09 +0800
2b598ee54 md: mark md_cluster_mod static ... Browse Code »

Sparse complains that it has no external declaration, and it turns out
that it is never even used outside of md.c. So just mark it static
and drop the export.

Acked-by: Guoqing Jiang
Signed-off-by: Christoph Hellwig
Signed-off-by: Song Liu

Christoph Hellwig
2019-04-11 06:26:09 +0800
ae50640be md: use correct type in super_1_sync ... Browse Code »

If we want to convert from a little endian format we need to cast
to a little endian type, otherwise sparse will be unhappy.

Signed-off-by: Christoph Hellwig
Signed-off-by: Song Liu

Christoph Hellwig
2019-04-11 06:26:09 +0800
00485d094 md: use correct type in super_1_load ... Browse Code »

If we want to convert from a little endian format we need to cast
to a little endian type, otherwise sparse will be unhappy.

Signed-off-by: Christoph Hellwig
Signed-off-by: Song Liu

Christoph Hellwig
2019-04-11 06:26:08 +0800
c35403f82 md: use correct types in md_bitmap_print_sb ... Browse Code »

If we want to convert from a little endian format we need to cast
to a little endian type, otherwise sparse will be unhappy.

Signed-off-by: Christoph Hellwig
Signed-off-by: Song Liu

Christoph Hellwig
2019-04-11 06:26:08 +0800
ed4d0a4ea md: add a missing endianness conversion in check_sb_changes ... Browse Code »

The on-disk value is little endian and we need to convert it to
native endian before storing the value in the in-core structure.

Fixes: 7564beda19b36 ("md-cluster/raid10: support add disk under grow mode")
Cc: # 4.20+
Acked-by: Guoqing Jiang
Signed-off-by: Christoph Hellwig
Signed-off-by: Song Liu

Christoph Hellwig
2019-04-11 06:26:08 +0800
ee37e6219 md: add mddev->pers to avoid potential NULL pointer dereference ... Browse Code »

When doing re-add, we need to ensure rdev->mddev->pers is not NULL,
which can avoid potential NULL pointer derefence in fallowing
add_bound_rdev().

Fixes: a6da4ef85cef ("md: re-add a failed disk")
Cc: Xiao Ni
Cc: NeilBrown
Cc: # 4.4+
Reviewed-by: NeilBrown
Signed-off-by: Yufen Yu
Signed-off-by: Song Liu

Yufen Yu
2019-04-11 06:26:08 +0800

07 Apr, 2019

1 commit

72deb455b block: remove CONFIG_LBDAF ... Browse Code »

Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
architectures. These types are required to support block device and/or
file sizes larger than 2 TiB, and have generally defaulted to on for
a long time. Enabling the option only increases the i386 tinyconfig
size by 145 bytes, and many data structures already always use
64-bit values for their in-core and on-disk data structures anyway,
so there should not be a large change in dynamic memory usage either.

Dropping this option removes a somewhat weird non-default config that
has cause various bugs or compiler warnings when actually used.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-04-07 00:48:35 +0800