03 Aug, 2020

1 commit


22 Jul, 2020

1 commit

  • Convert macro STRIPE_SIZE, STRIPE_SECTORS and STRIPE_SHIFT to
    RAID5_STRIPE_SIZE(), RAID5_STRIPE_SECTORS() and RAID5_STRIPE_SHIFT().

    This patch is prepare for the following adjustable stripe_size.
    It will not change any existing functionality.

    Signed-off-by: Yufen Yu
    Signed-off-by: Song Liu

    Yufen Yu
     

16 Jul, 2020

1 commit


15 Jul, 2020

1 commit


05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms and conditions of the gnu general public license
    version 2 as published by the free software foundation this program
    is distributed in the hope it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 263 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 Jan, 2019

1 commit

  • This fixes the case when md array assembly fails because of raid cache recovery
    unable to allocate a stripe, despite attempts to replay stripes and increase
    cache size. This happens because stripes released by r5c_recovery_replay_stripes
    and raid5_set_cache_size don't become available for allocation immediately.
    Released stripes first are placed on conf->released_stripes list and require
    md thread to merge them on conf->inactive_list before they can be allocated.

    Patch allows final allocation attempt during cache recovery to wait for
    new stripes to become availabe for allocation.

    Cc: linux-raid@vger.kernel.org
    Cc: Shaohua Li
    Cc: linux-stable # 4.10+
    Fixes: b4c625c67362 ("md/r5cache: r5cache recovery: part 1")
    Signed-off-by: Alexei Naberezhnov
    Signed-off-by: Song Liu

    Alexei Naberezhnov
     

11 Oct, 2018

1 commit

  • And earlier commit removed the error label to two statements that
    are now never reachable. Since this code is now dead code, remove it.

    Detected by CoverityScan, CID#1462409 ("Structurally dead code")

    Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
    Signed-off-by: Colin Ian King
    Signed-off-by: Shaohua Li

    Colin Ian King
     

19 Aug, 2018

1 commit

  • Pull input updates from Dmitry Torokhov:

    - a new driver for Rohm BU21029 touch controller

    - new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free

    - updates to Atmel, eeti. pxrc and iforce drivers

    - assorted driver cleanups and fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits)
    MAINTAINERS: Add PhoenixRC Flight Controller Adapter
    Input: do not use WARN() in input_alloc_absinfo()
    Input: mark expected switch fall-throughs
    Input: raydium_i2c_ts - use true and false for boolean values
    Input: evdev - switch to bitmap API
    Input: gpio-keys - switch to bitmap_zalloc()
    Input: elan_i2c_smbus - cast sizeof to int for comparison
    bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free()
    md: Avoid namespace collision with bitmap API
    dm: Avoid namespace collision with bitmap API
    Input: pm8941-pwrkey - add resin entry
    Input: pm8941-pwrkey - abstract register offsets and event code
    Input: iforce - reorganize joystick configuration lists
    Input: atmel_mxt_ts - move completion to after config crc is updated
    Input: atmel_mxt_ts - don't report zero pressure from T9
    Input: atmel_mxt_ts - zero terminate config firmware file
    Input: atmel_mxt_ts - refactor config update code to add context struct
    Input: atmel_mxt_ts - config CRC may start at T71
    Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM
    Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE
    ...

    Linus Torvalds
     

02 Aug, 2018

1 commit

  • bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods.

    On the other hand MD bitmap API is special case.
    Adding 'md' prefix to it to avoid name space collision.

    No functional changes intended.

    Signed-off-by: Andy Shevchenko
    Acked-by: Shaohua Li
    Signed-off-by: Dmitry Torokhov

    Andy Shevchenko
     

06 Jul, 2018

1 commit


31 May, 2018

1 commit


16 Jan, 2018

1 commit

  • In order to provide data consistency with PPL for disks with write-back
    cache enabled all data has to be flushed to disks before next PPL
    entry. The disks to be flushed are marked in the bitmap. It's modified
    under a mutex and it's only read after PPL io unit is submitted.

    A limitation of 64 disks in the array has been introduced to keep data
    structures and implementation simple. RAID5 arrays with so many disks are
    not likely due to high risk of multiple disks failure. Such restriction
    should not be a real life limitation.

    With write-back cache disabled next PPL entry is submitted when data write
    for current one completes. Data flush defers next log submission so trigger
    it when there are no stripes for handling found.

    As PPL assures all data is flushed to disk at request completion, just
    acknowledge flush request when PPL is enabled.

    Signed-off-by: Tomasz Majchrzak
    Signed-off-by: Shaohua Li

    Tomasz Majchrzak
     

21 Dec, 2017

1 commit


12 Dec, 2017

1 commit

  • In do_md_run(), md threads should not wake up until the array is fully
    initialized in md_run(). However, in raid5_run(), raid5-cache may wake
    up mddev->thread to flush stripes that need to be written back. This
    design doesn't break badly right now. But it could lead to bad bug in
    the future.

    This patch tries to resolve this problem by splitting start up work
    into two personality functions, run() and start(). Tasks that do not
    require the md threads should go into run(), while task that require
    the md threads go into start().

    r5l_load_log() is moved to raid5_start(), so it is not called until
    the md threads are started in do_md_run().

    Signed-off-by: Song Liu
    Signed-off-by: Shaohua Li

    Song Liu
     

02 Dec, 2017

1 commit

  • r5c_journal_mode_set() is called by r5c_journal_mode_store() and
    raid_ctr() in dm-raid. We don't need mddev_lock() when calling from
    raid_ctr(). This patch fixes this by moves the mddev_lock() to
    r5c_journal_mode_store().

    Cc: stable@vger.kernel.org (v4.13+)
    Signed-off-by: Song Liu
    Signed-off-by: Shaohua Li

    Song Liu
     

02 Nov, 2017

3 commits

  • lockdep_assert_held is a better way to assert lock held, and it works
    for UP.

    Signed-off-by: Shaohua Li

    Shaohua Li
     
  • The '2' argument means "wake up anything that is waiting".
    This is an inelegant part of the design and was added
    to help support management of suspend_lo/suspend_hi setting.
    Now that suspend_lo/hi is managed in mddev_suspend/resume,
    that need is gone.
    These is still a couple of places where we call 'quiesce'
    with an argument of '2', but they can safely be changed to
    call ->quiesce(.., 1); ->quiesce(.., 0) which
    achieve the same result at the small cost of pausing IO
    briefly.

    This removes a small "optimization" from suspend_{hi,lo}_store,
    but it isn't clear that optimization served a useful purpose.
    The code now is a lot clearer.

    Suggested-by: Shaohua Li
    Signed-off-by: NeilBrown
    Signed-off-by: Shaohua Li

    NeilBrown
     
  • Most often mddev_suspend() is called with
    reconfig_mutex held. Make this a requirement in
    preparation a subsequent patch. Also require
    reconfig_mutex to be held for mddev_resume(),
    partly for symmetry and partly to guarantee
    no races with incr/decr of mddev->suspend.

    Taking the mutex in r5c_disable_writeback_async() is
    a little tricky as this is called from a work queue
    via log->disable_writeback_work, and flush_work()
    is called on that while holding ->reconfig_mutex.
    If the work item hasn't run before flush_work()
    is called, the work function will not be able to
    get the mutex.

    So we use mddev_trylock() inside the wait_event() call, and have that
    abort when conf->log is set to NULL, which happens before
    flush_work() is called.
    We wait in mddev->sb_wait and ensure this is woken
    when any of the conditions change. This requires
    waking mddev->sb_wait in mddev_unlock(). This is only
    like to trigger extra wake_ups of threads that needn't
    be woken when metadata is being written, and that
    doesn't happen often enough that the cost would be
    noticeable.

    Signed-off-by: NeilBrown
    Signed-off-by: Shaohua Li

    NeilBrown
     

17 Oct, 2017

1 commit

  • Motivated by the desire to illiminate the imprecise nature of
    DM-specific patches being unnecessarily sent to both the MD maintainer
    and mailing-list. Which is born out of the fact that DM files also
    reside in drivers/md/

    Now all MD-specific files in drivers/md/ start with either "raid" or
    "md-" and the MAINTAINERS file has been updated accordingly.

    Shaohua: don't change module name

    Signed-off-by: Mike Snitzer
    Signed-off-by: Shaohua Li

    Mike Snitzer
     

08 Sep, 2017

2 commits

  • Pull MD updates from Shaohua Li:
    "This update mainly fixes bugs:

    - Make raid5 ppl support several ppl from Pawel

    - Several raid5-cache bug fixes from Song

    - Bitmap fixes from Neil and Me

    - One raid1/10 regression fix since 4.12 from Me

    - Other small fixes and cleanup"

    * tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    md/bitmap: disable bitmap_resize for file-backed bitmaps.
    raid5-ppl: Recovery support for multiple partial parity logs
    md: Runtime support for multiple ppls
    md/raid0: attach correct cgroup info in bio
    lib/raid6: align AVX512 constants to 512 bits, not bytes
    raid5: remove raid5_build_block
    md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show
    md: replace seq_release_private with seq_release
    md: notify about new spare disk in the container
    md/raid1/10: reset bio allocated from mempool
    md/raid5: release/flush io in raid5_do_work()
    md/bitmap: copy correct data for bitmap super

    Linus Torvalds
     
  • Pull block layer updates from Jens Axboe:
    "This is the first pull request for 4.14, containing most of the code
    changes. It's a quiet series this round, which I think we needed after
    the churn of the last few series. This contains:

    - Fix for a registration race in loop, from Anton Volkov.

    - Overflow complaint fix from Arnd for DAC960.

    - Series of drbd changes from the usual suspects.

    - Conversion of the stec/skd driver to blk-mq. From Bart.

    - A few BFQ improvements/fixes from Paolo.

    - CFQ improvement from Ritesh, allowing idling for group idle.

    - A few fixes found by Dan's smatch, courtesy of Dan.

    - A warning fixup for a race between changing the IO scheduler and
    device remova. From David Jeffery.

    - A few nbd fixes from Josef.

    - Support for cgroup info in blktrace, from Shaohua.

    - Also from Shaohua, new features in the null_blk driver to allow it
    to actually hold data, among other things.

    - Various corner cases and error handling fixes from Weiping Zhang.

    - Improvements to the IO stats tracking for blk-mq from me. Can
    drastically improve performance for fast devices and/or big
    machines.

    - Series from Christoph removing bi_bdev as being needed for IO
    submission, in preparation for nvme multipathing code.

    - Series from Bart, including various cleanups and fixes for switch
    fall through case complaints"

    * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
    kernfs: checking for IS_ERR() instead of NULL
    drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
    drbd: Fix allyesconfig build, fix recent commit
    drbd: switch from kmalloc() to kmalloc_array()
    drbd: abort drbd_start_resync if there is no connection
    drbd: move global variables to drbd namespace and make some static
    drbd: rename "usermode_helper" to "drbd_usermode_helper"
    drbd: fix race between handshake and admin disconnect/down
    drbd: fix potential deadlock when trying to detach during handshake
    drbd: A single dot should be put into a sequence.
    drbd: fix rmmod cleanup, remove _all_ debugfs entries
    drbd: Use setup_timer() instead of init_timer() to simplify the code.
    drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
    drbd: new disk-option disable-write-same
    drbd: Fix resource role for newly created resources in events2
    drbd: mark symbols static where possible
    drbd: Send P_NEG_ACK upon write error in protocol != C
    drbd: add explicit plugging when submitting batches
    drbd: change list_for_each_safe to while(list_first_entry_or_null)
    drbd: introduce drbd_recv_header_maybe_unplug
    ...

    Linus Torvalds
     

26 Aug, 2017

1 commit


24 Aug, 2017

1 commit

  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

08 Aug, 2017

2 commits

  • In r5l_log_endio(), once log->io_list_lock is released, the io unit
    may be accessed (or even freed) by other threads. Current code
    doesn't handle the io_unit properly, which leads to potential race
    conditions.

    This patch solves this race condition by:

    1. Add a pending_stripe count flush_payload. Multiple flush_payloads
    are counted as only one pending_stripe. Flag has_flush_payload is
    added to show whether the io unit has flush_payload;
    2. In r5l_log_endio(), check flags has_null_flush and
    has_flush_payload with log->io_list_lock held. After the lock
    is released, this IO unit is only accessed when we know the
    pending_stripe counter cannot be zeroed by other threads.

    Signed-off-by: Song Liu
    Signed-off-by: Shaohua Li

    Song Liu
     
  • In r5c_journal_mode_set(), it is necessary to call mddev_lock()
    before accessing conf and conf->log. Otherwise, the conf->log
    may change (and become NULL).

    Shaohua: fix unlock in failure cases

    Signed-off-by: Song Liu
    Signed-off-by: Shaohua Li

    Song Liu
     

19 Jun, 2017

1 commit

  • "flags" arguments are often seen as good API design as they allow
    easy extensibility.
    bioset_create_nobvec() is implemented internally as a variation in
    flags passed to __bioset_create().

    To support future extension, make the internal structure part of the
    API.
    i.e. add a 'flags' argument to bioset_create() and discard
    bioset_create_nobvec().

    Note that the bio_split allocations in drivers/md/raid* do not need
    the bvec mempool - they should have used bioset_create_nobvec().

    Suggested-by: Christoph Hellwig
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: NeilBrown
    Signed-off-by: Jens Axboe

    NeilBrown
     

12 Jun, 2017

1 commit

  • We've already got a few conflicts and upcoming work depends on some of the
    changes that have gone into mainline as regression fixes for this series.

    Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
    trees to continue working on 4.13 changes.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Jun, 2017

1 commit

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Jun, 2017

1 commit

  • Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
    synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
    definitions. generic_make_request_checks() however strips REQ_FUA and
    REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
    write cache and thus write effectively becomes asynchronous which can
    lead to performance regressions

    Fix the problem by making sure all bios which are synchronous are
    properly marked with REQ_SYNC.

    CC: linux-raid@vger.kernel.org
    CC: Shaohua Li
    Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Shaohua Li

    Jan Kara
     

12 May, 2017

2 commits

  • Currently, sync of raid456 array cannot make progress when hitting
    data in writeback r5cache.

    This patch fixes this issue by flushing cached data of the stripe
    before processing the sync request. This is achived by:

    1. In handle_stripe(), do not set STRIPE_SYNCING if the stripe is
    in write back cache;
    2. In r5c_try_caching_write(), handle the stripe in sync with write
    through;
    3. In do_release_stripe(), make stripe in sync write out and send
    it to the state machine.

    Shaohua: explictly set STRIPE_HANDLE after write out completed

    Signed-off-by: Song Liu
    Signed-off-by: Shaohua Li

    Song Liu
     
  • For the raid456 with writeback cache, when journal device failed during
    normal operation, it is still possible to persist all data, as all
    pending data is still in stripe cache. However, it is necessary to handle
    journal failure gracefully.

    During journal failures, the following logic handles the graceful shutdown
    of journal:
    1. raid5_error() marks the device as Faulty and schedules async work
    log->disable_writeback_work;
    2. In disable_writeback_work (r5c_disable_writeback_async), the mddev is
    suspended, set to write through, and then resumed. mddev_suspend()
    flushes all cached stripes;
    3. All cached stripes need to be flushed carefully to the RAID array.

    This patch fixes issues within the process above:
    1. In r5c_update_on_rdev_error() schedule disable_writeback_work for
    journal failures;
    2. In r5c_disable_writeback_async(), wait for MD_SB_CHANGE_PENDING,
    since raid5_error() updates superblock.
    3. In handle_stripe(), allow stripes with data in journal (s.injournal > 0)
    to make progress during log_failed;
    4. In delay_towrite(), if log failed only process data in the cache (skip
    new writes in dev->towrite);
    5. In __get_priority_stripe(), process loprio_list during journal device
    failures.
    6. In raid5_remove_disk(), wait for all cached stripes are flushed before
    calling log_exit().

    Signed-off-by: Song Liu
    Signed-off-by: Shaohua Li

    Song Liu
     

11 May, 2017

1 commit


04 May, 2017

1 commit

  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - A major update for DM cache that reduces the latency for deciding
    whether blocks should migrate to/from the cache. The bio-prison-v2
    interface supports this improvement by enabling direct dispatch of
    work to workqueues rather than having to delay the actual work
    dispatch to the DM cache core. So the dm-cache policies are much more
    nimble by being able to drive IO as they see fit. One immediate
    benefit from the improved latency is a cache that should be much more
    adaptive to changing workloads.

    - Add a new DM integrity target that emulates a block device that has
    additional per-sector tags that can be used for storing integrity
    information.

    - Add a new authenticated encryption feature to the DM crypt target
    that builds on the capabilities provided by the DM integrity target.

    - Add MD interface for switching the raid4/5/6 journal mode and update
    the DM raid target to use it to enable aid4/5/6 journal write-back
    support.

    - Switch the DM verity target over to using the asynchronous hash
    crypto API (this helps work better with architectures that have
    access to off-CPU algorithm providers, which should reduce CPU
    utilization).

    - Various request-based DM and DM multipath fixes and improvements from
    Bart and Christoph.

    - A DM thinp target fix for a bio structure leak that occurs for each
    discard IFF discard passdown is enabled.

    - A fix for a possible deadlock in DM bufio and a fix to re-check the
    new buffer allocation watermark in the face of competing admin
    changes to the 'max_cache_size_bytes' tunable.

    - A couple DM core cleanups.

    * tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits)
    dm bufio: check new buffer allocation watermark every 30 seconds
    dm bufio: avoid a possible ABBA deadlock
    dm mpath: make it easier to detect unintended I/O request flushes
    dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()
    dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH
    dm: introduce enum dm_queue_mode to cleanup related code
    dm mpath: verify __pg_init_all_paths locking assumptions at runtime
    dm: verify suspend_locking assumptions at runtime
    dm block manager: remove an unused argument from dm_block_manager_create()
    dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
    dm mpath: delay requeuing while path initialization is in progress
    dm mpath: avoid that path removal can trigger an infinite loop
    dm mpath: split and rename activate_path() to prepare for its expanded use
    dm ioctl: prevent stack leak in dm ioctl call
    dm integrity: use previously calculated log2 of sectors_per_block
    dm integrity: use hex2bin instead of open-coded variant
    dm crypt: replace custom implementation of hex2bin()
    dm crypt: remove obsolete references to per-CPU state
    dm verity: switch to using asynchronous hash crypto API
    dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
    ...

    Linus Torvalds
     

27 Mar, 2017

1 commit

  • Commit 2ded370373a4 ("md/r5cache: State machine for raid5-cache write
    back mode") added support for "write-back" caching on the raid journal
    device.

    In order to allow the dm-raid target to switch between the available
    "write-through" and "write-back" modes, provide a new
    r5c_journal_mode_set() API.

    Use the new API in existing r5c_journal_mode_store()

    Signed-off-by: Heinz Mauelshagen
    Acked-by: Shaohua Li
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     

26 Mar, 2017

1 commit


23 Mar, 2017

3 commits

  • md/raid5 needs to keep track of how many stripe_heads are processing a
    bio so that it can delay calling bio_endio() until all stripe_heads
    have completed. It currently uses 16 bits of ->bi_phys_segments for
    this purpose.

    16 bits is only enough for 256M requests, and it is possible for a
    single bio to be larger than this, which causes problems. Also, the
    bio struct contains a larger counter, __bi_remaining, which has a
    purpose very similar to the purpose of our counter. So stop using
    ->bi_phys_segments, and instead use __bi_remaining.

    This means we don't need to initialize the counter, as our caller
    initializes it to '1'. It also means we can call bio_endio() directly
    as it tests this counter internally.

    Signed-off-by: NeilBrown
    Signed-off-by: Shaohua Li

    NeilBrown
     
  • We currently gather bios that need to be returned into a bio_list
    and call bio_endio() on them all together.
    The original reason for this was to avoid making the calls while
    holding a spinlock.
    Locking has changed a lot since then, and that reason is no longer
    valid.

    So discard return_io() and various return_bi lists, and just call
    bio_endio() directly as needed.

    Signed-off-by: NeilBrown
    Signed-off-by: Shaohua Li

    NeilBrown
     
  • We use md_write_start() to increase the count of pending writes, and
    md_write_end() to decrement the count. We currently count bios
    submitted to md/raid5. Change it count stripe_heads that a WRITE bio
    has been attached to.

    So now, raid5_make_request() calls md_write_start() and then
    md_write_end() to keep the count elevated during the setup of the
    request.

    add_stripe_bio() calls md_write_start() for each stripe_head, and the
    completion routines always call md_write_end(), instead of only
    calling it when raid5_dec_bi_active_stripes() returns 0.
    make_discard_request also calls md_write_start/end().

    The parallel between md_write_{start,end} and use of bi_phys_segments
    can be seen in that:
    Whenever we set bi_phys_segments to 1, we now call md_write_start.
    Whenever we increment it on non-read requests with
    raid5_inc_bi_active_stripes(), we now call md_write_start().
    Whenever we decrement bi_phys_segments on non-read requsts with
    raid5_dec_bi_active_stripes(), we now call md_write_end().

    This reduces our dependence on keeping a per-bio count of active
    stripes in bi_phys_segments.

    md_write_inc() is added which parallels md_write_start(), but requires
    that a write has already been started, and is certain never to sleep.
    This can be used inside a spinlocked region when adding to a write
    request.

    Signed-off-by: NeilBrown
    Signed-off-by: Shaohua Li

    NeilBrown
     

17 Mar, 2017

2 commits

  • In r5c_finish_stripe_write_out(), R5LOG_PAYLOAD_FLUSH is append to
    log->current_io.

    Appending R5LOG_PAYLOAD_FLUSH in quiesce needs extra writes to
    journal. To simplify the logic, we just skip R5LOG_PAYLOAD_FLUSH in
    quiesce.

    Even R5LOG_PAYLOAD_FLUSH supports multiple stripes per payload.
    However, current implementation is one stripe per R5LOG_PAYLOAD_FLUSH,
    which is simpler.

    Signed-off-by: Song Liu
    Signed-off-by: Shaohua Li

    Song Liu
     
  • This patch adds handling of R5LOG_PAYLOAD_FLUSH in journal recovery.
    Next patch will add logic that generate R5LOG_PAYLOAD_FLUSH on flush
    finish.

    When R5LOG_PAYLOAD_FLUSH is seen in recovery, pending data and parity
    will be dropped from recovery. This will reduce the number of stripes
    to replay, and thus accelerate the recovery process.

    Signed-off-by: Song Liu
    Signed-off-by: Shaohua Li

    Song Liu