13 Jan, 2021

1 commit

  • commit d16baa3f1453c14d680c5fee01cd122a22d0e0ce upstream.

    When initializing iocost for a queue, its rqos should be registered before
    the blkcg policy is activated to allow policy data initiailization to lookup
    the associated ioc. This unfortunately means that the rqos methods can be
    called on bios before iocgs are attached to all existing blkgs.

    While the race is theoretically possible on ioc_rqos_throttle(), it mostly
    happened in ioc_rqos_merge() due to the difference in how they lookup ioc.
    The former determines it from the passed in @rqos and then bails before
    dereferencing iocg if the looked up ioc is disabled, which most likely is
    the case if initialization is still in progress. The latter looked up ioc by
    dereferencing the possibly NULL iocg making it a lot more prone to actually
    triggering the bug.

    * Make ioc_rqos_merge() use the same method as ioc_rqos_throttle() to look
    up ioc for consistency.

    * Make ioc_rqos_throttle() and ioc_rqos_merge() test for NULL iocg before
    dereferencing it.

    * Explain the danger of NULL iocgs in blk_iocost_init().

    Signed-off-by: Tejun Heo
    Reported-by: Jonathan Lemon
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

10 Oct, 2020

1 commit


08 Oct, 2020

1 commit


25 Sep, 2020

6 commits

  • An iocg may have 0 debt but non-zero delay. The current debt forgiveness
    logic doesn't act on such iocgs. This can lead to unexpected behaviors - an
    iocg with a little bit of debt will have its delay canceled through debt
    forgiveness but one w/o any debt but active delay will have to wait out
    until its delay decays out.

    This patch updates the debt handling logic so that it treats delays the same
    as debts. If either debt or delay is active, debt forgiveness logic kicks in
    and acts on both the same way.

    Also, avoid turning the debt and delay directly to zero as that can confuse
    state transitions.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Debt forgiveness logic was counting the number of consecutive !busy periods
    as the trigger condition. While this usually works, it can easily be thrown
    off by temporary fluctuations especially on configurations w/ short periods.

    This patch reimplements debt forgiveness so that:

    * Use the average usage over the forgiveness period instead of counting
    consecutive periods.

    * Debt is reduced at around the target rate (1/2 every 100ms) regardless of
    ioc period duration.

    * Usage threshold is raised to 50%. Combined with the preceding changes and
    the switch to average usage, this makes debt forgivness a lot more
    effective at reducing the amount of unnecessary idleness.

    * Constants are renamed with DFGV_ prefix.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Debt sets the initial delay duration which is decayed over time. The current
    debt reduction halved the debt but didn't change the delay. It prevented
    future debts from increasing delay but didn't do anything to lower the
    existing delay, limiting the mechanism's ability to reduce unnecessary
    idling.

    Reset iocg->delay to 0 after debt reduction so that iocg_kick_waitq()
    recalculates new delay value based on the reduced debt amount.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Debt reduction was blocked if any iocg was short on budget in the past
    period to avoid reducing debts while some iocgs are saturated. However, this
    ends up unnecessarily blocking debt reduction due to temporary local
    imbalances when the device is generally being underutilized, while also
    failing to block when the underlying device is overwhelmed and the usage
    becomes low from high latency.

    Given that debt accumulation mostly happens with swapout bursts which can
    significantly deteriorate the underlying device's latency response, the
    current logic is not great.

    Let's replace it with ioc->busy_level based condition so that we block debt
    reduction when the underlying device is being saturated. ioc_forgive_debts()
    call is moved after busy_level determination.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Debt reduction logic is going to be improved and expanded. Factor it out
    into ioc_forgive_debts() and generalize the comment a bit. No functional
    change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

15 Sep, 2020

1 commit

  • adjust_inuse_and_calc_cost() is responsible for reducing the amount of
    donated weights dynamically in period as the budget runs low. Because we
    don't want to do full donation calculation in period, we keep latching up
    inuse by INUSE_ADJ_STEP_PCT of the active weight of the cgroup until the
    resulting hweight_inuse is satisfactory.

    Unfortunately, the adj_step calculation was reading the active weight before
    acquiring ioc->lock. Because the current thread could have lost race to
    activate the iocg to another thread before entering this function, it may
    read the active weight as zero before acquiring ioc->lock. When this
    happens, the adj_step is calculated as zero and the incremental adjustment
    loop becomes an infinite one.

    Fix it by fetching the active weight after acquiring ioc->lock.

    Fixes: b0853ab4a238 ("blk-iocost: revamp in-period donation snapbacks")
    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

12 Sep, 2020

1 commit

  • Conceptually, root_iocg->hweight_donating must be less than WEIGHT_ONE but
    all hweight calculations round up and thus it may end up >= WEIGHT_ONE
    triggering divide-by-zero and other issues. Bound the value to avoid
    surprises.

    Fixes: e08d02aa5fc9 ("blk-iocost: implement Andy's method for donation weight updates")
    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

02 Sep, 2020

25 commits

  • These are really cheap to collect and can be useful in debugging iocost
    behavior. Add them as debug stats for now.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Update and restore the inuse update tracepoints.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • When an iocg accumulates too much vtime or gets deactivated, we throw away
    some vtime, which lowers the overall device utilization. As the exact amount
    which is being thrown away is known, we can compensate by accelerating the
    vrate accordingly so that the extra vtime generated in the current period
    matches what got lost.

    This significantly improves work conservation when involving high weight
    cgroups with intermittent and bursty IO patterns.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • A low weight iocg can amass a large amount of debt, for example, when
    anonymous memory gets reclaimed aggressively. If the system has a lot of
    memory paired with a slow IO device, the debt can span multiple seconds or
    more. If there are no other subsequent IO issuers, the in-debt iocg may end
    up blocked paying its debt while the IO device is idle.

    This patch implements a mechanism to protect against such pathological
    cases. If the device has been sufficiently idle for a substantial amount of
    time, the debts are halved. The criteria are on the conservative side as we
    want to resolve the rare extreme cases without impacting regular operation
    by forgiving debts too readily.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Curently, iocost syncs the delay duration to the outstanding debt amount,
    which seemed enough to protect the system from anon memory hogs. However,
    that was mostly because the delay calcuation was using hweight_inuse which
    quickly converges towards zero under debt for delay duration calculation,
    often pusnishing debtors overly harshly for longer than deserved.

    The previous patch fixed the delay calcuation and now the protection against
    anonymous memory hogs isn't enough because the effect of delay is indirect
    and non-linear and a huge amount of future debt can accumulate abruptly
    while unthrottled.

    This patch implements delay hysteresis so that delay is decayed
    exponentially over time instead of getting cleared immediately as debt is
    paid off. While the overall behavior is similar to the blk-cgroup
    implementation used by blk-iolatency, a lot of the details are different and
    due to the empirical nature of the mechanism, it's challenging to adapt the
    mechanism for one controller without negatively impacting the other.

    As the delay is gradually decayed now, there's no point in running it from
    its own hrtimer. Periodic updates are now performed from ioc_timer_fn() and
    the dedicated hrtimer is removed.

    Signed-off-by: Tejun Heo
    Cc: Josef Bacik
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Debt handling had several issues.

    * How much inuse a debtor carries wasn't clearly defined. inuse would be
    driven down over time from not issuing IOs but it'd be better to clamp it
    to minimum immediately once in debt.

    * How much can be paid off was determined by hweight_inuse. As inuse was
    driven down, the payment amount would fall together regardless of the
    debtor's active weight. This means that the debtors were punished harshly.

    * ioc_rqos_merge() wasn't calling blkcg_schedule_throttle() after
    iocg_kick_delay().

    This patch revamps debt handling so that

    * Debt handling owns inuse for iocgs in debt and keeps them at zero.

    * Payment amount is determined by hweight_active. This is more deterministic
    and safer than hweight_inuse but still far from ideal in that it doesn't
    factor in possible donations from other iocgs for debt payments. This
    likely needs further improvements in the future.

    * iocg_rqos_merge() now calls blkcg_schedule_throttle() as necessary.

    Signed-off-by: Tejun Heo
    Cc: Andy Newell
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • When the margin drops below the minimum on a donating iocg, donation is
    immediately canceled in full. There are a couple shortcomings with the
    current behavior.

    * It's abrupt. A small temporary budget deficit can lead to a wide swing in
    weight allocation and a large surplus.

    * It's open coded in the issue path but not implemented for the merge path.
    A series of merges at a low inuse can make the iocg incur debts and stall
    incorrectly.

    This patch reimplements in-period donation snapbacks so that

    * inuse adjustment and cost calculations are factored into
    adjust_inuse_and_calc_cost() which is called from both the issue and merge
    paths.

    * Snapbacks are more gradual. It occurs in quarter steps.

    * A snapback triggers if the margin goes below the low threshold and is
    lower than the budget at the time of the last adjustment.

    * For the above, __propagate_weights() stores the margin in
    iocg->saved_margin. Move iocg->last_inuse storing together into
    __propagate_weights() for consistency.

    * Full snapback is guaranteed when there are waiters.

    * With precise donation and gradual snapbacks, inuse adjustments are now a
    lot more effective and the value of scaling inuse on weight changes isn't
    clear. Removed inuse scaling from weight_update().

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • iocost has various safety nets to combat inuse adjustment calculation
    inaccuracies. With Andy's method implemented in transfer_surpluses(), inuse
    adjustment calculations are now accurate and we can make donation amount
    determinations accurate too.

    * Stop keeping track of past usage history and using the maximum. Act on the
    immediate usage information.

    * Remove donation constraints defined by SURPLUS_* constants. Donate
    whatever isn't used.

    * Determine the donation amount so that the iocg will end up with
    MARGIN_TARGET_PCT budget at the end of the coming period assuming the same
    usage as the previous period. TARGET is set at 50% of period, which is the
    previous maximum. This provides smooth convergence for most repetitive IO
    patterns.

    * Apply donation logic early at 20% budget. There's no risk in doing so as
    the calculation is based on the delta between the current budget and the
    target budget at the end of the coming period.

    * Remove preemptive iocg activation for zero cost IOs. As donation can reach
    near zero now, the mere activation doesn't provide any protection anymore.
    In the unlikely case that this becomes a problem, the right solution is
    assigning appropriate costs for such IOs.

    This significantly improves the donation determination logic while also
    simplifying it. Now all donations are immediate, exact and smooth.

    Signed-off-by: Tejun Heo
    Cc: Andy Newell
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • iocost implements work conservation by reducing iocg->inuse and propagating
    the adjustment upwards proportionally. However, while I knew the target
    absolute hierarchical proportion - adjusted hweight_inuse, I couldn't figure
    out how to determine the iocg->inuse adjustment to achieve that and
    approximated the adjustment by scaling iocg->inuse using the proportion of
    the needed hweight_inuse changes.

    When nested, these scalings aren't accurate even when adjusting a single
    node as the donating node also receives the benefit of the donated portion.
    When multiple nodes are donating as they often do, they can be wildly wrong.

    iocost employed various safety nets to combat the inaccuracies. There are
    ample buffers in determining how much to donate, the adjustments are
    conservative and gradual. While it can achieve a reasonable level of work
    conservation in simple scenarios, the inaccuracies can easily add up leading
    to significant loss of total work. This in turn makes it difficult to
    closely cap vrate as vrate adjustment is needed to compensate for the loss
    of work. The combination of inaccurate donation calculations and vrate
    adjustments can lead to wide fluctuations and clunky overall behaviors.

    Andy Newell devised a method to calculate the needed ->inuse updates to
    achieve the target hweight_inuse's. The method is compatible with the
    proportional inuse adjustment propagation which allows all hot path
    operations to be local to each iocg.

    To roughly summarize, Andy's method divides the tree into donating and
    non-donating parts, calculates global donation rate which is used to
    determine the target hweight_inuse for each node, and then derives per-level
    proportions. There's non-trivial amount of math involved. Please refer to
    the following pdfs for detailed descriptions.

    https://drive.google.com/file/d/1PsJwxPFtjUnwOY1QJ5AeICCcsL7BM3bo
    https://drive.google.com/file/d/1vONz1-fzVO7oY5DXXsLjSxEtYYQbOvsE
    https://drive.google.com/file/d/1WcrltBOSPN0qXVdBgnKm4mdp9FhuEFQN

    This patch implements Andy's method in transfer_surpluses(). This makes the
    donation calculations accurate per cycle and enables further improvements in
    other parts of the donation logic.

    Signed-off-by: Tejun Heo
    Cc: Andy Newell
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • The way the surplus donation logic is structured isn't great. There are two
    separate paths for starting/increasing donations and decreasing them making
    the logic harder to follow and is prone to unnecessary behavior differences.

    In preparation for improved donation handling, this patch restructures the
    code so that

    * All donors - new, increasing and decreasing - are funneled through the
    same code path.

    * The target donation calculation is factored into hweight_after_donation()
    which is called once from the same spot for all possible donors.

    * Actual inuse adjustment is factored into trasnfer_surpluses().

    This change introduces a few behavior differences - e.g. donation amount
    reduction now uses the max usage of the recent three periods just like new
    and increasing donations, and inuse now gets adjusted upwards the same way
    it gets downwards. These differences are unlikely to have severely negative
    implications and the whole logic will be revamped soon.

    This patch also removes two tracepoints. The existing TPs don't quite fit
    the new implementation. A later patch will update and reinstate them.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Budget donations are inaccurate and could take multiple periods to converge.
    To prevent triggering vrate adjustments while surplus transfers were
    catching up, vrate adjustment was suppressed if donations were increasing,
    which was indicated by non-zero nr_surpluses.

    This entangling won't be necessary with the scheduled rewrite of donation
    mechanism which will make it precise and immediate. Let's decouple the two
    in preparation.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Instead of marking iocgs with surplus with a flag and filtering for them
    while walking all active iocgs, build a surpluses list. This doesn't make
    much difference now but will help implementing improved donation logic which
    will iterate iocgs with surplus multiple times.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, iocg->usages[] which are used to guide inuse adjustments are
    calculated from vtime deltas. This, however, assumes that the hierarchical
    inuse weight at the time of calculation held for the entire period, which
    often isn't true and can lead to significant errors.

    Now that we have absolute usage information collected, we can derive
    iocg->usages[] from iocg->local_stat.usage_us so that inuse adjustment
    decisions are made based on actual absolute usage. The calculated usage is
    clamped between 1 and WEIGHT_ONE and WEIGHT_ONE is also used to signal
    saturation regardless of the current hierarchical inuse weight.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, iocost doesn't collect or expose any statistics punting off all
    monitoring duties to drgn based iocost_monitor.py. While it works for some
    scenarios, there are some usability and data availability challenges. For
    example, accurate per-cgroup usage information can't be tracked by vtime
    progression at all and the number available in iocg->usages[] are really
    short-term snapshots used for control heuristics with possibly significant
    errors.

    This patch implements per-cgroup absolute usage stat counter and exposes it
    through io.stat along with the current vrate. Usage stat collection and
    flushing employ the same method as cgroup rstat on the active iocg's and the
    only hot path overhead is preemption toggling and adding to a percpu
    counter.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, debt handling requires only iocg->waitq.lock. In the future, we
    want to adjust and propagate inuse changes depending on debt status. Let's
    grab ioc->lock in debt handling paths in preparation.

    * Because ioc->lock nests outside iocg->waitq.lock, the decision to grab
    ioc->lock needs to be made before entering the critical sections.

    * Add and use iocg_[un]lock() which handles the conditional double locking.

    * Add @pay_debt to iocg_kick_waitq() so that debt payment happens only when
    the caller grabbed both locks.

    This patch is prepatory and the comments contain references to future
    changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • The margin handling was pretty inconsistent.

    * ioc->margin_us and ioc->inuse_margin_vtime were used as vtime margin
    thresholds. However, the two are in different units with the former
    requiring conversion to vtime on use.

    * iocg_kick_waitq() was using a quarter of WAITQ_TIMER_MARGIN_PCT of
    period_us as the timer slack - ~1.2%. While iocg_kick_delay() was using a
    quarter of ioc->margin_us - ~12.5%. There aren't strong reasons to use
    different values for the two.

    This patch cleans up margin and timer slack handling:

    * vtime margins are now recorded in ioc->margins.{min, max} on period
    duration changes and used consistently.

    * Timer slack is now 1% of period_us and recorded in ioc->timer_slack_ns and
    used consistently for iocg_kick_waitq() and iocg_kick_delay().

    The only functional change is shortening of timer slack. No meaningful
    visible change is expected.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • They are in microseconds and wrap in around 1.2 hours with u32. While
    unlikely, confusions from wraparounds are still possible. We aren't saving
    anything meaningful by keeping these u32. Let's make them u64.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • To improve weight donations, we want to able to scale inuse with a greater
    accuracy and down below 1. Let's make non-hierarchical weights to use
    WEIGHT_ONE based fixed point numbers too like hierarchical ones.

    This doesn't cause any behavior changes yet.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • We're gonna use HWEIGHT_WHOLE for regular weights too. Let's rename it to
    WEIGHT_ONE.

    Pure rename.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • iocg_kick_waitq() is the function which pays debt and iocg_kick_delay()
    updates the actual delay status accordingly. If iocg_kick_delay() is not
    called after iocg_kick_delay() updated debt, unnecessarily large delays can
    be applied temporarily.

    Let's make sure such conditions don't occur by making iocg_kick_waitq()
    always call iocg_kick_delay() after paying debt.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • We'll make iocg_kick_waitq() call iocg_kick_delay(). Reorder them in
    preparation. This is pure code reorganization.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • __propagate_weights() currently expects the callers to clamp inuse within
    [1, active], which is needlessly fragile. The inuse adjustment logic is
    going to be revamped, in preparation, let's make __propagate_weights() clamp
    inuse on entry.

    Also, make it avoid weight updates altogether if neither active or inuse is
    changed.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • It already propagates two weights - active and inuse - and there will be
    another soon. Let's drop the confusing misnomers. Rename
    [__]propagate_active_weights() to [__]propagate_weights() and
    commit_active_weights() to commit_weights().

    This is pure rename.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk-iocost has been reading percpu stat counters from remote cpus which on
    some archs can lead to torn reads in really rare occassions. Use local[64]_t
    for those counters.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • ioc_pd_free() grabs irq-safe ioc->lock without ensuring that irq is disabled
    when it can be called with irq disabled or enabled. This has a small chance
    of causing A-A deadlocks and triggers lockdep splats. Use irqsave operations
    instead.

    Signed-off-by: Tejun Heo
    Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: Jens Axboe

    Tejun Heo
     

11 Aug, 2020

1 commit

  • Pull locking updates from Thomas Gleixner:
    "A set of locking fixes and updates:

    - Untangle the header spaghetti which causes build failures in
    various situations caused by the lockdep additions to seqcount to
    validate that the write side critical sections are non-preemptible.

    - The seqcount associated lock debug addons which were blocked by the
    above fallout.

    seqcount writers contrary to seqlock writers must be externally
    serialized, which usually happens via locking - except for strict
    per CPU seqcounts. As the lock is not part of the seqcount, lockdep
    cannot validate that the lock is held.

    This new debug mechanism adds the concept of associated locks.
    sequence count has now lock type variants and corresponding
    initializers which take a pointer to the associated lock used for
    writer serialization. If lockdep is enabled the pointer is stored
    and write_seqcount_begin() has a lockdep assertion to validate that
    the lock is held.

    Aside of the type and the initializer no other code changes are
    required at the seqcount usage sites. The rest of the seqcount API
    is unchanged and determines the type at compile time with the help
    of _Generic which is possible now that the minimal GCC version has
    been moved up.

    Adding this lockdep coverage unearthed a handful of seqcount bugs
    which have been addressed already independent of this.

    While generally useful this comes with a Trojan Horse twist: On RT
    kernels the write side critical section can become preemtible if
    the writers are serialized by an associated lock, which leads to
    the well known reader preempts writer livelock. RT prevents this by
    storing the associated lock pointer independent of lockdep in the
    seqcount and changing the reader side to block on the lock when a
    reader detects that a writer is in the write side critical section.

    - Conversion of seqcount usage sites to associated types and
    initializers"

    * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    locking/seqlock, headers: Untangle the spaghetti monster
    locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
    x86/headers: Remove APIC headers from
    seqcount: More consistent seqprop names
    seqcount: Compress SEQCNT_LOCKNAME_ZERO()
    seqlock: Fold seqcount_LOCKNAME_init() definition
    seqlock: Fold seqcount_LOCKNAME_t definition
    seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
    hrtimer: Use sequence counter with associated raw spinlock
    kvm/eventfd: Use sequence counter with associated spinlock
    userfaultfd: Use sequence counter with associated spinlock
    NFSv4: Use sequence counter with associated spinlock
    iocost: Use sequence counter with associated spinlock
    raid5: Use sequence counter with associated spinlock
    vfs: Use sequence counter with associated spinlock
    timekeeping: Use sequence counter with associated raw spinlock
    xfrm: policy: Use sequence counters with associated lock
    netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
    netfilter: conntrack: Use sequence counter with associated spinlock
    sched: tasks: Use sequence counter with associated spinlock
    ...

    Linus Torvalds
     

31 Jul, 2020

1 commit


29 Jul, 2020

1 commit

  • A sequence counter write side critical section must be protected by some
    form of locking to serialize writers. A plain seqcount_t does not
    contain the information of which lock must be held when entering a write
    side critical section.

    Use the new seqcount_spinlock_t data type, which allows to associate a
    spinlock with the sequence counter. This enables lockdep to verify that
    the spinlock used for writer serialization is held when the write side
    critical section is entered.

    If lockdep is disabled this lock association is compiled out and has
    neither storage size nor runtime overhead.

    Signed-off-by: Ahmed S. Darwish
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Daniel Wagner
    Link: https://lkml.kernel.org/r/20200720155530.1173732-21-a.darwish@linutronix.de

    Ahmed S. Darwish
     

24 Jun, 2020

1 commit

  • Make use of the struct_size() helper instead of an open-coded version
    in order to avoid any potential type mistakes.

    This code was detected with the help of Coccinelle and, audited and
    fixed manually.

    Signed-off-by: Gustavo A. R. Silva
    Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83
    Signed-off-by: Jens Axboe

    Gustavo A. R. Silva