09 Jul, 2018

3 commits

  • Current IO controllers for the block layer are less than ideal for our
    use case. The io.max controller is great at hard limiting, but it is
    not work conserving. This patch introduces io.latency. You provide a
    latency target for your group and we monitor the io in short windows to
    make sure we are not exceeding those latency targets. This makes use of
    the rq-qos infrastructure and works much like the wbt stuff. There are
    a few differences from wbt

    - It's bio based, so the latency covers the whole block layer in addition to
    the actual io.
    - We will throttle all IO types that comes in here if we need to.
    - We use the mean latency over the 100ms window. This is because writes can
    be particularly fast, which could give us a false sense of the impact of
    other workloads on our protected workload.
    - By default there's no throttling, we set the queue_depth to INT_MAX so that
    we can have as many outstanding bio's as we're allowed to. Only at
    throttle time do we pay attention to the actual queue depth.
    - We backcharge cgroups for root cg issued IO and induce artificial
    delays in order to deal with cases like metadata only or swap heavy
    workloads.

    In testing this has worked out relatively well. Protected workloads
    will throttle noisy workloads down to 1 io at time if they are doing
    normal IO on their own, or induce up to a 1 second delay per syscall if
    they are doing a lot of root issued IO (metadata/swap IO).

    Our testing has revolved mostly around our production web servers where
    we have hhvm (the web server application) in a protected group and
    everything else in another group. We see slightly higher requests per
    second (RPS) on the test tier vs the control tier, and much more stable
    RPS across all machines in the test tier vs the control tier.

    Another test we run is a slow memory allocator in the unprotected group.
    Before this would eventually push us into swap and cause the whole box
    to die and not recover at all. With these patches we see slight RPS
    drops (usually 10-15%) before the memory consumer is properly killed and
    things recover within seconds.

    Signed-off-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Josef Bacik
     
  • blkcg-qos is going to do essentially what wbt does, only on a cgroup
    basis. Break out the common code that will be shared between blkcg-qos
    and wbt into blk-rq-qos.* so they can both utilize the same
    infrastructure.

    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Josef Bacik
     
  • Exclude zoned block device members from struct request_queue for
    CONFIG_BLK_DEV_ZONED == n. Avoid breaking the build by only building
    the code that uses these struct request_queue members if
    CONFIG_BLK_DEV_ZONED != n.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Damien Le Moal
    Cc: Matias Bjorling
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

09 Aug, 2017

1 commit

  • Like pci and virtio, we add a rdma helper for affinity
    spreading. This achieves optimal mq affinity assignments
    according to the underlying rdma device affinity maps.

    Reviewed-by: Jens Axboe
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Max Gurtovoy
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Sagi Grimberg
     

19 Apr, 2017

2 commits

  • The BFQ I/O scheduler features an optimal fair-queuing
    (proportional-share) scheduling algorithm, enriched with several
    mechanisms to boost throughput and reduce latency for interactive and
    real-time applications. This makes BFQ a large and complex piece of
    code. This commit addresses this issue by splitting BFQ into three
    main, independent components, and by moving each component into a
    separate source file:
    1. Main algorithm: handles the interaction with the kernel, and
    decides which requests to dispatch; it uses the following two further
    components to achieve its goals.
    2. Scheduling engine (Hierarchical B-WF2Q+ scheduling algorithm):
    computes the schedule, using weights and budgets provided by the above
    component.
    3. cgroups support: handles group operations (creation, destruction,
    move, ...).

    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • We tag as v0 the version of BFQ containing only BFQ's engine plus
    hierarchical support. BFQ's engine is introduced by this commit, while
    hierarchical support is added by next commit. We use the v0 tag to
    distinguish this minimal version of BFQ from the versions containing
    also the features and the improvements added by next commits. BFQ-v0
    coincides with the version of BFQ submitted a few years ago [1], apart
    from the introduction of preemption, described below.

    BFQ is a proportional-share I/O scheduler, whose general structure,
    plus a lot of code, are borrowed from CFQ.

    - Each process doing I/O on a device is associated with a weight and a
    (bfq_)queue.

    - BFQ grants exclusive access to the device, for a while, to one queue
    (process) at a time, and implements this service model by
    associating every queue with a budget, measured in number of
    sectors.

    - After a queue is granted access to the device, the budget of the
    queue is decremented, on each request dispatch, by the size of the
    request.

    - The in-service queue is expired, i.e., its service is suspended,
    only if one of the following events occurs: 1) the queue finishes
    its budget, 2) the queue empties, 3) a "budget timeout" fires.

    - The budget timeout prevents processes doing random I/O from
    holding the device for too long and dramatically reducing
    throughput.

    - Actually, as in CFQ, a queue associated with a process issuing
    sync requests may not be expired immediately when it empties. In
    contrast, BFQ may idle the device for a short time interval,
    giving the process the chance to go on being served if it issues
    a new request in time. Device idling typically boosts the
    throughput on rotational devices, if processes do synchronous
    and sequential I/O. In addition, under BFQ, device idling is
    also instrumental in guaranteeing the desired throughput
    fraction to processes issuing sync requests (see [2] for
    details).

    - With respect to idling for service guarantees, if several
    processes are competing for the device at the same time, but
    all processes (and groups, after the following commit) have
    the same weight, then BFQ guarantees the expected throughput
    distribution without ever idling the device. Throughput is
    thus as high as possible in this common scenario.

    - Queues are scheduled according to a variant of WF2Q+, named
    B-WF2Q+, and implemented using an augmented rb-tree to preserve an
    O(log N) overall complexity. See [2] for more details. B-WF2Q+ is
    also ready for hierarchical scheduling. However, for a cleaner
    logical breakdown, the code that enables and completes
    hierarchical support is provided in the next commit, which focuses
    exactly on this feature.

    - B-WF2Q+ guarantees a tight deviation with respect to an ideal,
    perfectly fair, and smooth service. In particular, B-WF2Q+
    guarantees that each queue receives a fraction of the device
    throughput proportional to its weight, even if the throughput
    fluctuates, and regardless of: the device parameters, the current
    workload and the budgets assigned to the queue.

    - The last, budget-independence, property (although probably
    counterintuitive in the first place) is definitely beneficial, for
    the following reasons:

    - First, with any proportional-share scheduler, the maximum
    deviation with respect to an ideal service is proportional to
    the maximum budget (slice) assigned to queues. As a consequence,
    BFQ can keep this deviation tight not only because of the
    accurate service of B-WF2Q+, but also because BFQ *does not*
    need to assign a larger budget to a queue to let the queue
    receive a higher fraction of the device throughput.

    - Second, BFQ is free to choose, for every process (queue), the
    budget that best fits the needs of the process, or best
    leverages the I/O pattern of the process. In particular, BFQ
    updates queue budgets with a simple feedback-loop algorithm that
    allows a high throughput to be achieved, while still providing
    tight latency guarantees to time-sensitive applications. When
    the in-service queue expires, this algorithm computes the next
    budget of the queue so as to:

    - Let large budgets be eventually assigned to the queues
    associated with I/O-bound applications performing sequential
    I/O: in fact, the longer these applications are served once
    got access to the device, the higher the throughput is.

    - Let small budgets be eventually assigned to the queues
    associated with time-sensitive applications (which typically
    perform sporadic and short I/O), because, the smaller the
    budget assigned to a queue waiting for service is, the sooner
    B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).

    - Weights can be assigned to processes only indirectly, through I/O
    priorities, and according to the relation:
    weight = 10 * (IOPRIO_BE_NR - ioprio).
    The next patch provides, instead, a cgroups interface through which
    weights can be assigned explicitly.

    - If several processes are competing for the device at the same time,
    but all processes and groups have the same weight, then BFQ
    guarantees the expected throughput distribution without ever idling
    the device. It uses preemption instead. Throughput is then much
    higher in this common scenario.

    - ioprio classes are served in strict priority order, i.e.,
    lower-priority queues are not served as long as there are
    higher-priority queues. Among queues in the same class, the
    bandwidth is distributed in proportion to the weight of each
    queue. A very thin extra bandwidth is however guaranteed to the Idle
    class, to prevent it from starving.

    - If the strict_guarantees parameter is set (default: unset), then BFQ
    - always performs idling when the in-service queue becomes empty;
    - forces the device to serve one I/O request at a time, by
    dispatching a new request only if there is no outstanding
    request.
    In the presence of differentiated weights or I/O-request sizes,
    both the above conditions are needed to guarantee that every
    queue receives its allotted share of the bandwidth (see
    Documentation/block/bfq-iosched.txt for more details). Setting
    strict_guarantees may evidently affect throughput.

    [1] https://lkml.org/lkml/2008/4/1/234
    https://lkml.org/lkml/2008/11/11/148

    [2] P. Valente and M. Andreolini, "Improving Application
    Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
    the 5th Annual International Systems and Storage Conference
    (SYSTOR '12), June 2012.
    Slightly extended version:
    http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-
    results.pdf

    Signed-off-by: Fabio Checconi
    Signed-off-by: Paolo Valente
    Signed-off-by: Arianna Avanzini
    Signed-off-by: Jens Axboe

    Paolo Valente
     

15 Apr, 2017

1 commit

  • The Kyber I/O scheduler is an I/O scheduler for fast devices designed to
    scale to multiple queues. Users configure only two knobs, the target
    read and synchronous write latencies, and the scheduler tunes itself to
    achieve that latency goal.

    The implementation is based on "tokens", built on top of the scalable
    bitmap library. Tokens serve as a mechanism for limiting requests. There
    are two tiers of tokens: queueing tokens and dispatch tokens.

    A queueing token is required to allocate a request. In fact, these
    tokens are actually the blk-mq internal scheduler tags, but the
    scheduler manages the allocation directly in order to implement its
    policy.

    Dispatch tokens are device-wide and split up into two scheduling
    domains: reads vs. writes. Each hardware queue dispatches batches
    round-robin between the scheduling domains as long as tokens are
    available for that domain.

    These tokens can be used as the mechanism to enable various policies.
    The policy Kyber uses is inspired by active queue management techniques
    for network routing, similar to blk-wbt. The scheduler monitors
    latencies and scales the number of dispatch tokens accordingly. Queueing
    tokens are used to prevent starvation of synchronous requests by
    asynchronous requests.

    Various extensions are possible, including better heuristics and ionice
    support. The new scheduler isn't set as the default yet.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

03 Mar, 2017

1 commit

  • Pull vhost updates from Michael Tsirkin:
    "virtio, vhost: optimizations, fixes

    Looks like a quiet cycle for vhost/virtio, just a couple of minor
    tweaks. Most notable is automatic interrupt affinity for blk and scsi.
    Hopefully other devices are not far behind"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio-console: avoid DMA from stack
    vhost: introduce O(1) vq metadata cache
    virtio_scsi: use virtio IRQ affinity
    virtio_blk: use virtio IRQ affinity
    blk-mq: provide a default queue mapping for virtio device
    virtio: provide a method to get the IRQ affinity mask for a virtqueue
    virtio: allow drivers to request IRQ affinity when creating VQs
    virtio_pci: simplify MSI-X setup
    virtio_pci: don't duplicate the msix_enable flag in struct pci_dev
    virtio_pci: use shared interrupts for virtqueues
    virtio_pci: remove struct virtio_pci_vq_info
    vhost: try avoiding avail index access when getting descriptor
    virtio_mmio: expose header to userspace

    Linus Torvalds
     

28 Feb, 2017

1 commit


18 Feb, 2017

1 commit


07 Feb, 2017

1 commit

  • This patch implements the necessary logic to bring an Opal
    enabled drive out of a factory-enabled into a working
    Opal state.

    This patch set also enables logic to save a password to
    be replayed during a resume from suspend.

    Signed-off-by: Scott Bauer
    Signed-off-by: Rafael Antognolli
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Scott Bauer
     

01 Feb, 2017

1 commit


28 Jan, 2017

1 commit

  • This fixes a couple of problems:

    1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus.
    2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at
    all.

    Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option.

    Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree")
    Signed-off-by: Omar Sandoval

    Augment Kconfig description.

    Signed-off-by: Jens Axboe

    Omar Sandoval
     

27 Jan, 2017

1 commit

  • In preparation for putting blk-mq debugging information in debugfs,
    create a directory tree mirroring the one in sysfs:

    # tree -d /sys/kernel/debug/block
    /sys/kernel/debug/block
    |-- nvme0n1
    | `-- mq
    | |-- 0
    | | `-- cpu0
    | |-- 1
    | | `-- cpu1
    | |-- 2
    | | `-- cpu2
    | `-- 3
    | `-- cpu3
    `-- vda
    `-- mq
    `-- 0
    |-- cpu0
    |-- cpu1
    |-- cpu2
    `-- cpu3

    Also add the scaffolding for the actual files that will go in here,
    either under the hardware queue or software queue directories.

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

18 Jan, 2017

2 commits

  • This is basically identical to deadline-iosched, except it registers
    as a MQ capable scheduler. This is still a single queue design.

    Signed-off-by: Jens Axboe
    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval

    Jens Axboe
     
  • This adds a set of hooks that intercepts the blk-mq path of
    allocating/inserting/issuing/completing requests, allowing
    us to develop a scheduler within that framework.

    We reuse the existing elevator scheduler API on the registration
    side, but augment that with the scheduler flagging support for
    the blk-mq interfce, and with a separate set of ops hooks for MQ
    devices.

    We split driver and scheduler tags, so we can run the scheduling
    independently of device queue depth.

    Signed-off-by: Jens Axboe
    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval

    Jens Axboe
     

11 Nov, 2016

2 commits

  • We can hook this up to the block layer, to help throttle buffered
    writes.

    wbt registers a few trace points that can be used to track what is
    happening in the system:

    wbt_lat: 259:0: latency 2446318
    wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1,
    wmean=518866, wmin=15522, wmax=5330353, wsamples=57
    wbt_step: 259:0: step down: step=1, window=72727272, background=8, normal=16, max=32

    This shows a sync issue event (wbt_lat) that exceeded it's time. wbt_stat
    dumps the current read/write stats for that window, and wbt_step shows a
    step down event where we now scale back writes. Each trace includes the
    device, 259:0 in this case.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • For legacy block, we simply track them in the request queue. For
    blk-mq, we track them on a per-sw queue basis, which we can then
    sum up through the hardware queues and finally to a per device
    state.

    The stats are tracked in, roughly, 0.1s interval windows.

    Add sysfs files to display the stats.

    The feature is off by default, to avoid any extra overhead. In-kernel
    users of it can turn it on by setting QUEUE_FLAG_STATS in the queue
    flags. We currently don't turn it on if someone just reads any of
    the stats files, that is something we could add as well.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

19 Oct, 2016

1 commit

  • Implement zoned block device zone information reporting and reset.
    Zone information are reported as struct blk_zone. This implementation
    does not differentiate between host-aware and host-managed device
    models and is valid for both. Two functions are provided:
    blkdev_report_zones for discovering the zone configuration of a
    zoned block device, and blkdev_reset_zones for resetting the write
    pointer of sequential zones. The helper function blk_queue_zone_size
    and bdev_zone_size are also provided for, as the name suggest,
    obtaining the zone size (in 512B sectors) of the zones of the device.

    Signed-off-by: Hannes Reinecke

    [Damien: * Removed the zone cache
    * Implement report zones operation based on earlier proposal
    by Shaun Tancheff ]
    Signed-off-by: Damien Le Moal
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Shaun Tancheff
    Tested-by: Shaun Tancheff
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

10 Oct, 2016

1 commit


22 Sep, 2016

1 commit


19 Sep, 2016

1 commit


15 Sep, 2016

1 commit


24 Jan, 2016

1 commit

  • Pull rdma updates from Doug Ledford:
    "Initial roundup of 4.5 merge window patches

    - Remove usage of ib_query_device and instead store attributes in
    ib_device struct

    - Move iopoll out of block and into lib, rename to irqpoll, and use
    in several places in the rdma stack as our new completion queue
    polling library mechanism. Update the other block drivers that
    already used iopoll to use the new mechanism too.

    - Replace the per-entry GID table locks with a single GID table lock

    - IPoIB multicast cleanup

    - Cleanups to the IB MR facility

    - Add support for 64bit extended IB counters

    - Fix for netlink oops while parsing RDMA nl messages

    - RoCEv2 support for the core IB code

    - mlx4 RoCEv2 support

    - mlx5 RoCEv2 support

    - Cross Channel support for mlx5

    - Timestamp support for mlx5

    - Atomic support for mlx5

    - Raw QP support for mlx5

    - MAINTAINERS update for mlx4/mlx5

    - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates

    - Add support for remote invalidate to the iSER driver (pushed
    through the RDMA tree due to dependencies, acknowledged by nab)

    - Update to NFSoRDMA (pushed through the RDMA tree due to
    dependencies, acknowledged by Bruce)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
    IB/mlx5: Unify CQ create flags check
    IB/mlx5: Expose Raw Packet QP to user space consumers
    {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
    IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
    IB/mlx5: Add Raw Packet QP query functionality
    IB/mlx5: Add create and destroy functionality for Raw Packet QP
    IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
    IB/mlx5: Allocate a Transport Domain for each ucontext
    net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
    net/mlx5_core: Add RQ and SQ event handling
    net/mlx5_core: Export transport objects
    IB/mlx5: Expose CQE version to user-space
    IB/mlx5: Add CQE version 1 support to user QPs and SRQs
    IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
    IB/sa: Fix netlink local service GFP crash
    IB/srpt: Remove redundant wc array
    IB/qib: Improve ipoib UD performance
    IB/mlx4: Advertise RoCE v2 support
    IB/mlx4: Create and use another QP1 for RoCEv2
    IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
    ...

    Linus Torvalds
     

09 Jan, 2016

1 commit

  • Take the core badblocks implementation from md, and make it generally
    available. This follows the same style as kernel implementations of
    linked lists, rb-trees etc, where you can have a structure that can be
    embedded anywhere, and accessor functions to manipulate the data.

    The only changes in this copy of the code are ones to generalize
    function/variable names from md-specific ones. Also add init and free
    functions.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

12 Dec, 2015

1 commit


27 Sep, 2014

1 commit

  • The T10 Protection Information format is also used by some devices that
    do not go through the SCSI layer (virtual block devices, NVMe). Relocate
    the relevant functions to a block layer library that can be used without
    involving SCSI.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

20 May, 2014

2 commits


19 May, 2014

1 commit


25 Oct, 2013

1 commit

  • Linux currently has two models for block devices:

    - The classic request_fn based approach, where drivers use struct
    request units for IO. The block layer provides various helper
    functionalities to let drivers share code, things like tag
    management, timeout handling, queueing, etc.

    - The "stacked" approach, where a driver squeezes in between the
    block layer and IO submitter. Since this bypasses the IO stack,
    driver generally have to manage everything themselves.

    With drivers being written for new high IOPS devices, the classic
    request_fn based driver doesn't work well enough. The design dates
    back to when both SMP and high IOPS was rare. It has problems with
    scaling to bigger machines, and runs into scaling issues even on
    smaller machines when you have IOPS in the hundreds of thousands
    per device.

    The stacked approach is then most often selected as the model
    for the driver. But this means that everybody has to re-invent
    everything, and along with that we get all the problems again
    that the shared approach solved.

    This commit introduces blk-mq, block multi queue support. The
    design is centered around per-cpu queues for queueing IO, which
    then funnel down into x number of hardware submission queues.
    We might have a 1:1 mapping between the two, or it might be
    an N:M mapping. That all depends on what the hardware supports.

    blk-mq provides various helper functions, which include:

    - Scalable support for request tagging. Most devices need to
    be able to uniquely identify a request both in the driver and
    to the hardware. The tagging uses per-cpu caches for freed
    tags, to enable cache hot reuse.

    - Timeout handling without tracking request on a per-device
    basis. Basically the driver should be able to get a notification,
    if a request happens to fail.

    - Optional support for non 1:1 mappings between issue and
    submission queues. blk-mq can redirect IO completions to the
    desired location.

    - Support for per-request payloads. Drivers almost always need
    to associate a request structure with some driver private
    command structure. Drivers can tell blk-mq this at init time,
    and then any request handed to the driver will have the
    required size of memory associated with it.

    - Support for merging of IO, and plugging. The stacked model
    gets neither of these. Even for high IOPS devices, merging
    sequential IO reduces per-command overhead and thus
    increases bandwidth.

    For now, this is provided as a potential 3rd queueing model, with
    the hope being that, as it matures, it can replace both the classic
    and stacked model. That would get us back to having just 1 real
    model for block devices, leaving the stacked approach to dm/md
    devices (as it was originally intended).

    Contributions in this patch from the following people:

    Shaohua Li
    Alexander Gordeev
    Christoph Hellwig
    Mike Christie
    Matias Bjorling
    Jeff Moyer

    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Oct, 2013

1 commit

  • Recently commit bab55417b10c ("block: support embedded device command
    line partition") introduced CONFIG_CMDLINE_PARSER. However, that name
    is too generic and sounds like it enables/disables generic kernel boot
    arg processing, when it really is block specific.

    Before this option becomes a part of a full/final release, add the BLK_
    prefix to it so that it is clear in absence of any other context that it
    is block specific.

    In addition, fix up the following less critical items:
    - help text was not really at all helpful.
    - index file for Documentation was not updated
    - add the new arg to Documentation/kernel-parameters.txt
    - clarify wording in source comments

    Signed-off-by: Paul Gortmaker
    Cc: Jens Axboe
    Cc: Cai Zhiyong
    Cc: Wei Yongjun
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     

12 Sep, 2013

1 commit

  • Read block device partition table from command line. The partition used
    for fixed block device (eMMC) embedded device. It is no MBR, save
    storage space. Bootloader can be easily accessed by absolute address of
    data on the block device. Users can easily change the partition.

    This code reference MTD partition, source "drivers/mtd/cmdlinepart.c"
    About the partition verbose reference
    "Documentation/block/cmdline-partition.txt"

    [akpm@linux-foundation.org: fix printk text]
    [yongjun_wei@trendmicro.com.cn: fix error return code in parse_parts()]
    Signed-off-by: Cai Zhiyong
    Cc: Karel Zak
    Cc: "Wanglin (Albert)"
    Cc: Marius Groeger
    Cc: David Woodhouse
    Cc: Jens Axboe
    Cc: Brian Norris
    Cc: Artem Bityutskiy
    Signed-off-by: Wei Yongjun
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cai Zhiyong
     

04 Jan, 2012

2 commits


01 Aug, 2011

1 commit

  • This moves the FC classes bsg code to the block layer and
    makes it a lib so that other classes like iscsi and SAS can use it.

    It is helpful because working with the request queue, bios,
    creating scatterlists, etc are a pain that the LLD does not
    have to worry about with normal IOs and should not have to
    worry about for bsg requests.

    Signed-off-by: Mike Christie
    Signed-off-by: Jens Axboe

    Mike Christie
     

23 Oct, 2010

1 commit

  • * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
    xen-blkfront: disable barrier/flush write support
    Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
    block: remove BLKDEV_IFL_WAIT
    aic7xxx_old: removed unused 'req' variable
    block: remove the BH_Eopnotsupp flag
    block: remove the BLKDEV_IFL_BARRIER flag
    block: remove the WRITE_BARRIER flag
    swap: do not send discards as barriers
    fat: do not send discards as barriers
    ext4: do not send discards as barriers
    jbd2: replace barriers with explicit flush / FUA usage
    jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
    jbd: replace barriers with explicit flush / FUA usage
    nilfs2: replace barriers with explicit flush / FUA usage
    reiserfs: replace barriers with explicit flush / FUA usage
    gfs2: replace barriers with explicit flush / FUA usage
    btrfs: replace barriers with explicit flush / FUA usage
    xfs: replace barriers with explicit flush / FUA usage
    block: pass gfp_mask and flags to sb_issue_discard
    dm: convey that all flushes are processed as empty
    ...

    Linus Torvalds
     

16 Sep, 2010

1 commit


10 Sep, 2010

1 commit