08 May, 2019

1 commit

  • Pull block updates from Jens Axboe:
    "Nothing major in this series, just fixes and improvements all over the
    map. This contains:

    - Series of fixes for sed-opal (David, Jonas)

    - Fixes and performance tweaks for BFQ (via Paolo)

    - Set of fixes for bcache (via Coly)

    - Set of fixes for md (via Song)

    - Enabling multi-page for passthrough requests (Ming)

    - Queue release fix series (Ming)

    - Device notification improvements (Martin)

    - Propagate underlying device rotational status in loop (Holger)

    - Removal of mtip32xx trim support, which has been disabled for years
    (Christoph)

    - Improvement and cleanup of nvme command handling (Christoph)

    - Add block SPDX tags (Christoph)

    - Cleanup/hardening of bio/bvec iteration (Christoph)

    - A few NVMe pull requests (Christoph)

    - Removal of CONFIG_LBDAF (Christoph)

    - Various little fixes here and there"

    * tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block: (164 commits)
    block: fix mismerge in bvec_advance
    block: don't drain in-progress dispatch in blk_cleanup_queue()
    blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release
    blk-mq: always free hctx after request queue is freed
    blk-mq: split blk_mq_alloc_and_init_hctx into two parts
    blk-mq: free hw queue's resource in hctx's release handler
    blk-mq: move cancel of requeue_work into blk_mq_release
    blk-mq: grab .q_usage_counter when queuing request from plug code path
    block: fix function name in comment
    nvmet: protect discovery change log event list iteration
    nvme: mark nvme_core_init and nvme_core_exit static
    nvme: move command size checks to the core
    nvme-fabrics: check more command sizes
    nvme-pci: check more command sizes
    nvme-pci: remove an unneeded variable initialization
    nvme-pci: unquiesce admin queue on shutdown
    nvme-pci: shutdown on timeout during deletion
    nvme-pci: fix psdt field for single segment sgls
    nvme-multipath: don't print ANA group state by default
    nvme-multipath: split bios with the ns_head bio_set before submitting
    ...

    Linus Torvalds
     

07 May, 2019

1 commit

  • Pull crypto update from Herbert Xu:
    "API:
    - Add support for AEAD in simd
    - Add fuzz testing to testmgr
    - Add panic_on_fail module parameter to testmgr
    - Use per-CPU struct instead multiple variables in scompress
    - Change verify API for akcipher

    Algorithms:
    - Convert x86 AEAD algorithms over to simd
    - Forbid 2-key 3DES in FIPS mode
    - Add EC-RDSA (GOST 34.10) algorithm

    Drivers:
    - Set output IV with ctr-aes in crypto4xx
    - Set output IV in rockchip
    - Fix potential length overflow with hashing in sun4i-ss
    - Fix computation error with ctr in vmx
    - Add SM4 protected keys support in ccree
    - Remove long-broken mxc-scc driver
    - Add rfc4106(gcm(aes)) cipher support in cavium/nitrox"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits)
    crypto: ccree - use a proper le32 type for le32 val
    crypto: ccree - remove set but not used variable 'du_size'
    crypto: ccree - Make cc_sec_disable static
    crypto: ccree - fix spelling mistake "protedcted" -> "protected"
    crypto: caam/qi2 - generate hash keys in-place
    crypto: caam/qi2 - fix DMA mapping of stack memory
    crypto: caam/qi2 - fix zero-length buffer DMA mapping
    crypto: stm32/cryp - update to return iv_out
    crypto: stm32/cryp - remove request mutex protection
    crypto: stm32/cryp - add weak key check for DES
    crypto: atmel - remove set but not used variable 'alg_name'
    crypto: picoxcell - Use dev_get_drvdata()
    crypto: crypto4xx - get rid of redundant using_sd variable
    crypto: crypto4xx - use sync skcipher for fallback
    crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues
    crypto: crypto4xx - fix ctr-aes missing output IV
    crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA
    crypto: ux500 - use ccflags-y instead of CFLAGS_.o
    crypto: ccree - handle tee fips error during power management resume
    crypto: ccree - add function to handle cryptocell tee fips error
    ...

    Linus Torvalds
     

01 May, 2019

1 commit


30 Apr, 2019

3 commits

  • We only have two callers that need the integer loop iterator, and they
    can easily maintain it themselves.

    Suggested-by: Matthew Wilcox
    Reviewed-by: Johannes Thumshirn
    Acked-by: David Sterba
    Reviewed-by: Hannes Reinecke
    Acked-by: Coly Li
    Reviewed-by: Matthew Wilcox
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Use a variable containing the buffer address instead of the to be
    removed integer iterator from bio_for_each_segment_all.

    Suggested-by: Matthew Wilcox
    Reviewed-by: Hannes Reinecke
    Acked-by: Coly Li
    Reviewed-by: Matthew Wilcox
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Commit 95f18c9d1310 ("bcache: avoid potential memleak of list of
    journal_replay(s) in the CACHE_SYNC branch of run_cache_set") forgets
    to remove the original define of LIST_HEAD(journal), which makes
    the change no take effect. This patch removes redundant variable
    LIST_HEAD(journal) from run_cache_set(), to make Shenghui's fix
    working.

    Fixes: 95f18c9d1310 ("bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC branch of run_cache_set")
    Reported-by: Juha Aatrokoski
    Cc: Shenghui Wang
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     

29 Apr, 2019

2 commits

  • Replace the indirection through struct stack_trace with an invocation of
    the storage array based interface. This results in less storage space and
    indirection.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: dm-devel@redhat.com
    Cc: Mike Snitzer
    Cc: Alasdair Kergon
    Cc: Steven Rostedt
    Cc: Alexander Potapenko
    Cc: Alexey Dobriyan
    Cc: Andrew Morton
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: linux-mm@kvack.org
    Cc: David Rientjes
    Cc: Catalin Marinas
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: kasan-dev@googlegroups.com
    Cc: Mike Rapoport
    Cc: Akinobu Mita
    Cc: Christoph Hellwig
    Cc: iommu@lists.linux-foundation.org
    Cc: Robin Murphy
    Cc: Marek Szyprowski
    Cc: Johannes Thumshirn
    Cc: David Sterba
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: linux-btrfs@vger.kernel.org
    Cc: Daniel Vetter
    Cc: intel-gfx@lists.freedesktop.org
    Cc: Joonas Lahtinen
    Cc: Maarten Lankhorst
    Cc: dri-devel@lists.freedesktop.org
    Cc: David Airlie
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Tom Zanussi
    Cc: Miroslav Benes
    Cc: linux-arch@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190425094802.533968922@linutronix.de

    Thomas Gleixner
     
  • Replace the indirection through struct stack_trace with an invocation of
    the storage array based interface.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: dm-devel@redhat.com
    Cc: Mike Snitzer
    Cc: Alasdair Kergon
    Cc: Steven Rostedt
    Cc: Alexander Potapenko
    Cc: Alexey Dobriyan
    Cc: Andrew Morton
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: linux-mm@kvack.org
    Cc: David Rientjes
    Cc: Catalin Marinas
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: kasan-dev@googlegroups.com
    Cc: Mike Rapoport
    Cc: Akinobu Mita
    Cc: Christoph Hellwig
    Cc: iommu@lists.linux-foundation.org
    Cc: Robin Murphy
    Cc: Marek Szyprowski
    Cc: Johannes Thumshirn
    Cc: David Sterba
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: linux-btrfs@vger.kernel.org
    Cc: Daniel Vetter
    Cc: intel-gfx@lists.freedesktop.org
    Cc: Joonas Lahtinen
    Cc: Maarten Lankhorst
    Cc: dri-devel@lists.freedesktop.org
    Cc: David Airlie
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Tom Zanussi
    Cc: Miroslav Benes
    Cc: linux-arch@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190425094802.446326191@linutronix.de

    Thomas Gleixner
     

25 Apr, 2019

19 commits

  • The flags field in 'struct shash_desc' never actually does anything.
    The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
    However, no shash algorithm ever sleeps, making this flag a no-op.

    With this being the case, inevitably some users who can't sleep wrongly
    pass MAY_SLEEP. These would all need to be fixed if any shash algorithm
    actually started sleeping. For example, the shash_ahash_*() functions,
    which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
    from the ahash API to the shash API. However, the shash functions are
    called under kmap_atomic(), so actually they're assumed to never sleep.

    Even if it turns out that some users do need preemption points while
    hashing large buffers, we could easily provide a helper function
    crypto_shash_update_large() which divides the data into smaller chunks
    and calls crypto_shash_update() and cond_resched() for each chunk. It's
    not necessary to have a flag in 'struct shash_desc', nor is it necessary
    to make individual shash algorithms aware of this at all.

    Therefore, remove shash_desc::flags, and document that the
    crypto_shash_*() functions can be called from any context.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • …ranch of run_cache_set

    In the CACHE_SYNC branch of run_cache_set(), LIST_HEAD(journal) is used
    to collect journal_replay(s) and filled by bch_journal_read().

    If all goes well, bch_journal_replay() will release the list of
    jounal_replay(s) at the end of the branch.

    If something goes wrong, code flow will jump to the label "err:" and leave
    the list unreleased.

    This patch will release the list of journal_replay(s) in the case of
    error detected.

    v1 -> v2:
    * Move the release code to the location after label 'err:' to
    simply the change.

    Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
    Signed-off-by: Coly Li <colyli@suse.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

    Shenghui Wang
     
  • Elements of keylist should be accessed before the list is freed.
    Move bch_keylist_free() calling after the while loop to avoid wrong
    content accessed.

    Signed-off-by: Shenghui Wang
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Shenghui Wang
     
  • journal replay failed with messages:
    Sep 10 19:10:43 ceph kernel: bcache: error on
    bb379a64-e44e-4812-b91d-a5599871a3b1: bcache: journal entries
    2057493-2057567 missing! (replaying 2057493-2076601), disabling
    caching

    The reason is in journal_reclaim(), when discard is enabled, we send
    discard command and reclaim those journal buckets whose seq is old
    than the last_seq_now, but before we write a journal with last_seq_now,
    the machine is restarted, so the journal with the last_seq_now is not
    written to the journal bucket, and the last_seq_wrote in the newest
    journal is old than last_seq_now which we expect to be, so when we doing
    replay, journals from last_seq_wrote to last_seq_now are missing.

    It's hard to write a journal immediately after journal_reclaim(),
    and it harmless if those missed journal are caused by discarding
    since those journals are already wrote to btree node. So, if miss
    seqs are started from the beginning journal, we treat it as normal,
    and only print a message to show the miss journal, and point out
    it maybe caused by discarding.

    Patch v2 add a judgement condition to ignore the missed journal
    only when discard enabled as Coly suggested.

    (Coly Li: rebase the patch with other changes in bch_journal_replay())

    Signed-off-by: Tang Junhui
    Tested-by: Dennis Schridde
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Tang Junhui
     
  • This patch tries to release mutex bch_register_lock early, to give
    chance to stop cache set and bcache device early.

    This patch also expends time out of stopping all bcache device from
    2 seconds to 10 seconds, because stopping writeback rate update worker
    may delay for 5 seconds, 2 seconds is not enough.

    After this patch applied, stopping bcache devices during system reboot
    or shutdown is very hard to be observed any more.

    Signed-off-by: Coly Li
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Coly Li
     
  • Add code comments to explain which call back function might be called
    for the closure_queue(). This is an effort to make code to be more
    understandable for readers.

    Signed-off-by: Coly Li
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Coly Li
     
  • Add comments to explain why in register_bcache() blkdev_put() won't
    be called in two location. Add comments to explain why blkdev_put()
    must be called in register_cache() when cache_alloc() failed.

    Signed-off-by: Coly Li
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Coly Li
     
  • This patch adds return value to register_bdev(). Then if failure happens
    inside register_bdev(), its caller register_bcache() may detect and
    handle the failure more properly.

    Signed-off-by: Coly Li
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Coly Li
     
  • When failure happens inside bch_journal_replay(), calling
    cache_set_err_on() and handling the failure in async way is not a good
    idea. Because after bch_journal_replay() returns, registering code will
    continue to execute following steps, and unregistering code triggered
    by cache_set_err_on() is running in same time. First it is unnecessary
    to handle failure and unregister cache set in an async way, second there
    might be potential race condition to run register and unregister code
    for same cache set.

    So in this patch, if failure happens in bch_journal_replay(), we don't
    call cache_set_err_on(), and just print out the same error message to
    kernel message buffer, then return -EIO immediately caller. Then caller
    can detect such failure and handle it in synchrnozied way.

    Signed-off-by: Coly Li
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Coly Li
     
  • Bcache has several routines to release resources in implicit way, they
    are called when the associated kobj released. This patch adds code
    comments to notice when and which release callback will be called,
    - When dc->disk.kobj released:
    void bch_cached_dev_release(struct kobject *kobj)
    - When d->kobj released:
    void bch_flash_dev_release(struct kobject *kobj)
    - When c->kobj released:
    void bch_cache_set_release(struct kobject *kobj)
    - When ca->kobj released
    void bch_cache_release(struct kobject *kobj)

    Signed-off-by: Coly Li
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Coly Li
     
  • Currently run_cache_set() has no return value, if there is failure in
    bch_journal_replay(), the caller of run_cache_set() has no idea about
    such failure and just continue to execute following code after
    run_cache_set(). The internal failure is triggered inside
    bch_journal_replay() and being handled in async way. This behavior is
    inefficient, while failure handling inside bch_journal_replay(), cache
    register code is still running to start the cache set. Registering and
    unregistering code running as same time may introduce some rare race
    condition, and make the code to be more hard to be understood.

    This patch adds return value to run_cache_set(), and returns -EIO if
    bch_journal_rreplay() fails. Then caller of run_cache_set() may detect
    such failure and stop registering code flow immedidately inside
    register_cache_set().

    If journal replay fails, run_cache_set() can report error immediately
    to register_cache_set(). This patch makes the failure handling for
    bch_journal_replay() be in synchronized way, easier to understand and
    debug, and avoid poetential race condition for register-and-unregister
    in same time.

    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     
  • In journal_reclaim() ja->cur_idx of each cache will be update to
    reclaim available journal buckets. Variable 'int n' is used to count how
    many cache is successfully reclaimed, then n is set to c->journal.key
    by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache()
    loop will write the jset data onto each cache.

    The problem is, if all jouranl buckets on each cache is full, the
    following code in journal_reclaim(),

    529 for_each_cache(ca, c, iter) {
    530 struct journal_device *ja = &ca->journal;
    531 unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets;
    532
    533 /* No space available on this device */
    534 if (next == ja->discard_idx)
    535 continue;
    536
    537 ja->cur_idx = next;
    538 k->ptr[n++] = MAKE_PTR(0,
    539 bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
    540 ca->sb.nr_this_dev);
    541 }
    542
    543 bkey_init(k);
    544 SET_KEY_PTRS(k, n);

    If there is no available bucket to reclaim, the if() condition at line
    534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS()
    will set KEY_PTRS field of c->journal.key to 0.

    Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in
    journal_write_unlocked() the journal data is written in following loop,

    649 for (i = 0; i < KEY_PTRS(k); i++) {
    650-671 submit journal data to cache device
    672 }

    If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data
    won't be written to cache device here. If system crahed or rebooted
    before bkeys of the lost journal entries written into btree nodes, data
    corruption will be reported during bcache reload after rebooting the
    system.

    Indeed there is only one cache in a cache set, there is no need to set
    KEY_PTRS field in journal_reclaim() at all. But in order to keep the
    for_each_cache() logic consistent for now, this patch fixes the above
    problem by not setting 0 KEY_PTRS of journal key, if there is no bucket
    available to reclaim.

    Signed-off-by: Coly Li
    Reviewed-by: Hannes Reinecke
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Coly Li
     
  • 'int ret' is defined as a local variable inside macro read_bucket().
    Since this macro is called multiple times, and following patches will
    use a 'int ret' variable in bch_journal_read(), this patch moves
    definition of 'int ret' from macro read_bucket() to range of function
    bch_journal_read().

    Signed-off-by: Coly Li
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Coly Li
     
  • There is a race between cache device register and cache set unregister.
    For an already registered cache device, register_bcache will call
    bch_is_open to iterate through all cachesets and check every cache
    there. The race occurs if cache_set_free executes at the same time and
    clears the caches right before ca is dereferenced in bch_is_open_cache.
    To close the race, let's make sure the clean up work is protected by
    the bch_register_lock as well.

    This issue can be reproduced as follows,
    while true; do echo /dev/XXX> /sys/fs/bcache/register ; done&
    while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done &

    and results in the following oops,

    [ +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998
    [ +0.000457] #PF error: [normal kernel read fault]
    [ +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0
    [ +0.000388] Oops: 0000 [#1] SMP PTI
    [ +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6
    [ +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
    [ +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache]
    [ +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d
    [ +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202
    [ +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000
    [ +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001
    [ +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a
    [ +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0
    [ +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620
    [ +0.000384] FS: 00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000
    [ +0.000420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0
    [ +0.000837] Call Trace:
    [ +0.000682] ? _cond_resched+0x10/0x20
    [ +0.000691] ? __kmalloc+0x131/0x1b0
    [ +0.000710] kernfs_fop_write+0xfa/0x170
    [ +0.000733] __vfs_write+0x2e/0x190
    [ +0.000688] ? inode_security+0x10/0x30
    [ +0.000698] ? selinux_file_permission+0xd2/0x120
    [ +0.000752] ? security_file_permission+0x2b/0x100
    [ +0.000753] vfs_write+0xa8/0x1a0
    [ +0.000676] ksys_write+0x4d/0xb0
    [ +0.000699] do_syscall_64+0x3a/0xf0
    [ +0.000692] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Signed-off-by: Liang Chen
    Cc: stable@vger.kernel.org
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Liang Chen
     
  • There are a few nits in this function. They could in theory all
    be separate patches, but that's probably taking small commits
    too far.

    1) I added a brief comment saying what it does.

    2) I like to declare pointer parameters "const" where possible
    for documentation reasons.

    3) It uses bitmap_weight(&rand, BITS_PER_LONG) to compute the Hamming
    weight of a 32-bit random number (giving a random integer with
    mean 16 and variance 8). Passing by reference in a 64-bit variable
    is silly; just use hweight32().

    4) Its helper function fract_exp_two is unnecessarily tangled.
    Gcc can optimize the multiply by (1 << x) to a shift, but it can
    be written in a much more straightforward way at the cost of one
    more bit of internal precision. Some analysis reveals that this
    bit is always available.

    This shrinks the object code for fract_exp_two(x, 6) from 23 bytes:

    0000000000000000 :
    0: 89 f9 mov %edi,%ecx
    2: c1 e9 06 shr $0x6,%ecx
    5: b8 01 00 00 00 mov $0x1,%eax
    a: d3 e0 shl %cl,%eax
    c: 83 e7 3f and $0x3f,%edi
    f: d3 e7 shl %cl,%edi
    11: c1 ef 06 shr $0x6,%edi
    14: 01 f8 add %edi,%eax
    16: c3 retq

    To 19:

    0000000000000017 :
    17: 89 f8 mov %edi,%eax
    19: 83 e0 3f and $0x3f,%eax
    1c: 83 c0 40 add $0x40,%eax
    1f: 89 f9 mov %edi,%ecx
    21: c1 e9 06 shr $0x6,%ecx
    24: d3 e0 shl %cl,%eax
    26: c1 e8 06 shr $0x6,%eax
    29: c3 retq

    (Verified with 0 < 16<
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    George Spelvin
     
  • This patch uses kmemdup_nul to create a NUL-terminated string from
    dc->sb.label. This is better than open coding it.

    With this, we can move env[2] initialization into env[] array to make
    code more elegant.

    Signed-off-by: Geliang Tang
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Geliang Tang
     
  • clang has identified a code path in which it thinks a
    variable may be unused:

    drivers/md/bcache/alloc.c:333:4: error: variable 'bucket' is used uninitialized whenever 'if' condition is false
    [-Werror,-Wsometimes-uninitialized]
    fifo_pop(&ca->free_inc, bucket);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
    #define fifo_pop(fifo, i) fifo_pop_front(fifo, (i))
    ^~~~~~~~~~~~~~~~~~~~~~~~~
    drivers/md/bcache/util.h:189:6: note: expanded from macro 'fifo_pop_front'
    if (_r) { \
    ^~
    drivers/md/bcache/alloc.c:343:46: note: uninitialized use occurs here
    allocator_wait(ca, bch_allocator_push(ca, bucket));
    ^~~~~~
    drivers/md/bcache/alloc.c:287:7: note: expanded from macro 'allocator_wait'
    if (cond) \
    ^~~~
    drivers/md/bcache/alloc.c:333:4: note: remove the 'if' if its condition is always true
    fifo_pop(&ca->free_inc, bucket);
    ^
    drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
    #define fifo_pop(fifo, i) fifo_pop_front(fifo, (i))
    ^
    drivers/md/bcache/util.h:189:2: note: expanded from macro 'fifo_pop_front'
    if (_r) { \
    ^
    drivers/md/bcache/alloc.c:331:15: note: initialize the variable 'bucket' to silence this warning
    long bucket;
    ^

    This cannot happen in practice because we only enter the loop
    if there is at least one element in the list.

    Slightly rearranging the code makes this clearer to both the
    reader and the compiler, which avoids the warning.

    Signed-off-by: Arnd Bergmann
    Reviewed-by: Nathan Chancellor
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • To get the amount of unused buckets in sysfs_priority_stats, the code
    count the buckets which GC_SECTORS_USED is zero. It's correct and should
    not be overwritten by the count of buckets which prio is zero.

    Signed-off-by: Guoju Fang
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Guoju Fang
     
  • The bio from upper layer is considered completed when bio_complete()
    returns. In most scenarios bio_complete() is called in search_free(),
    but when read miss happens, the bio_compete() is called when backing
    device reading completed, while the struct search is still in use until
    cache inserting finished.

    If someone stops the bcache device just then, the device may be closed
    and released, but after cache inserting finished the struct search will
    access a freed struct cached_dev.

    This patch add the reference of bcache device before bio_complete() when
    read miss happens, and put it after the search is not used.

    Signed-off-by: Guoju Fang
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Guoju Fang
     

17 Apr, 2019

3 commits

  • The problem is that any 'uptodate' vs 'disks' check is not precise
    in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
    device that might try to kick off writes and then skip the action.
    Better to prevent the raid driver from taking unexpected action *and* keep
    the system alive vs killing the machine with BUG_ON.

    Note: fixed warning reported by kbuild test robot

    Signed-off-by: Dan Williams
    Signed-off-by: Nigel Croxon
    Signed-off-by: Song Liu

    Nigel Croxon
     
  • This reverts commit 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef.

    Cc: Dan Williams
    Cc: Nigel Croxon
    Cc: Xiao Ni
    Signed-off-by: Song Liu

    Song Liu
     
  • Mdadm expects that setting drive as faulty will fail with -EBUSY only if
    this operation will cause RAID to be failed. If this happens, it will
    try to stop the array. Currently -EBUSY might also be returned if rdev
    is in the middle of the removal process - for example there is a race
    with mdmon that already requested the drive to be failed/removed.

    If rdev does not contain mddev, return -ENODEV instead, so the caller
    can distinguish between those two cases and behave accordingly.

    Reviewed-by: NeilBrown
    Signed-off-by: Pawel Baldysiak
    Signed-off-by: Song Liu

    Pawel Baldysiak
     

15 Apr, 2019

1 commit

  • Pull in v5.1-rc5 to resolve two conflicts. One is in BFQ, in just
    a comment, and is trivial. The other one is a conflict due to a
    later fix in the bio multi-page work, and needs a bit more care.

    * tag 'v5.1-rc5': (476 commits)
    Linux 5.1-rc5
    fs: prevent page refcount overflow in pipe_buf_get
    mm: prevent get_user_pages() from overflowing page refcount
    mm: add 'try_get_page()' helper function
    mm: make page ref count overflow check tighter and more explicit
    clk: imx: Fix PLL_1416X not rounding rates
    clk: mediatek: fix clk-gate flag setting
    arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
    iommu/amd: Set exclusion range correctly
    clang-format: Update with the latest for_each macro list
    perf/core: Fix perf_event_disable_inatomic() race
    block: fix the return errno for direct IO
    Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
    NFSv4.1 fix incorrect return value in copy_file_range
    xprtrdma: Fix helper that drains the transport
    NFS: Fix handling of reply page vector
    NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
    dma-debug: only skip one stackframe entry
    platform/x86: pmc_atom: Drop __initconst on dmi table
    nvmet: fix discover log page when offsets are used
    ...

    Signed-off-by: Jens Axboe

    Jens Axboe
     

11 Apr, 2019

8 commits


07 Apr, 2019

1 commit

  • Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
    architectures. These types are required to support block device and/or
    file sizes larger than 2 TiB, and have generally defaulted to on for
    a long time. Enabling the option only increases the i386 tinyconfig
    size by 145 bytes, and many data structures already always use
    64-bit values for their in-core and on-disk data structures anyway,
    so there should not be a large change in dynamic memory usage either.

    Dropping this option removes a somewhat weird non-default config that
    has cause various bugs or compiler warnings when actually used.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig