01 Sep, 2017

2 commits

  • Many users have reported the lack of an HOWTO for properly configuring
    bfq as a function of the goal one wants to achieve (max
    responsiveness, max throughput, ...). In fact, all needed details are
    already provided in the documentation file bfq-iosched.txt. Yet the
    document lacks guidance on which parameter descriptions to look
    at. This commit adds some simple direction.

    Signed-off-by: Paolo Valente
    Reviewed-by: Jeremy Hickman
    Reviewed-by: Laurentiu Nicola
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • In addition to containing some typos and stale sentences, the file
    bfq-iosched.txt still mentioned a set of sysfs parameters that have
    been removed from this version of bfq. This commit fixes all these
    issues.

    Signed-off-by: Paolo Valente
    Reviewed-by: Jeremy Hickman
    Reviewed-by: Laurentiu Nicola
    Signed-off-by: Jens Axboe

    Paolo Valente
     

04 Jul, 2017

1 commit

  • Currently all integrity prep hooks are open-coded, and if prepare fails
    we ignore it's code and fail bio with EIO. Let's return real error to
    upper layer, so later caller may react accordingly.

    In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
    so it is reasonable to fold it in to one function.

    Signed-off-by: Dmitry Monakhov
    Reviewed-by: Martin K. Petersen
    [hch: merged with the latest block tree,
    return bool from bio_integrity_prep]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

19 Jun, 2017

1 commit

  • bio_clone() is no longer used.
    Only bio_clone_bioset() or bio_clone_fast().
    This is for the best, as bio_clone() used fs_bio_set,
    and filesystems are unlikely to want to use bio_clone().

    So remove bio_clone() and all references.
    This includes a fix to some incorrect documentation.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: NeilBrown
    Signed-off-by: Jens Axboe

    NeilBrown
     

10 May, 2017

1 commit

  • The introduction of the BFQ and Kyber I/O schedulers has triggered a
    new wave of I/O benchmarks. Unfortunately, comments and discussions on
    these benchmarks confirm that there is still little awareness that it
    is very hard to achieve, at the same time, a low latency and a high
    throughput. In particular, virtually all benchmarks measure
    throughput, or throughput-related figures of merit, but, for BFQ, they
    use the scheduler in its default configuration. This configuration is
    geared, instead, toward a low latency. This is evidently a sign that
    BFQ documentation is still too unclear on this important aspect. This
    commit addresses this issue by stressing how BFQ configuration must be
    (easily) changed if the only goal is maximum throughput.

    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     

19 Apr, 2017

3 commits

  • This patch introduces a simple heuristic to load applications quickly,
    and to perform the I/O requested by interactive applications just as
    quickly. To this purpose, both a newly-created queue and a queue
    associated with an interactive application (we explain in a moment how
    BFQ decides whether the associated application is interactive),
    receive the following two special treatments:

    1) The weight of the queue is raised.

    2) The queue unconditionally enjoys device idling when it empties; in
    fact, if the requests of a queue are sync, then performing device
    idling for the queue is a necessary condition to guarantee that the
    queue receives a fraction of the throughput proportional to its weight
    (see [1] for details).

    For brevity, we call just weight-raising the combination of these
    two preferential treatments. For a newly-created queue,
    weight-raising starts immediately and lasts for a time interval that:
    1) depends on the device speed and type (rotational or
    non-rotational), and 2) is equal to the time needed to load (start up)
    a large-size application on that device, with cold caches and with no
    additional workload.

    Finally, as for guaranteeing a fast execution to interactive,
    I/O-related tasks (such as opening a file), consider that any
    interactive application blocks and waits for user input both after
    starting up and after executing some task. After a while, the user may
    trigger new operations, after which the application stops again, and
    so on. Accordingly, the low-latency heuristic weight-raises again a
    queue in case it becomes backlogged after being idle for a
    sufficiently long (configurable) time. The weight-raising then lasts
    for the same time as for a just-created queue.

    According to our experiments, the combination of this low-latency
    heuristic and of the improvements described in the previous patch
    allows BFQ to guarantee a high application responsiveness.

    [1] P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
    Scheduler", Proceedings of the First Workshop on Mobile System
    Technologies (MST-2015), May 2015.
    http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf

    Signed-off-by: Paolo Valente
    Signed-off-by: Arianna Avanzini
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • Add complete support for full hierarchical scheduling, with a cgroups
    interface. Full hierarchical scheduling is implemented through the
    'entity' abstraction: both bfq_queues, i.e., the internal BFQ queues
    associated with processes, and groups are represented in general by
    entities. Given the bfq_queues associated with the processes belonging
    to a given group, the entities representing these queues are sons of
    the entity representing the group. At higher levels, if a group, say
    G, contains other groups, then the entity representing G is the parent
    entity of the entities representing the groups in G.

    Hierarchical scheduling is performed as follows: if the timestamps of
    a leaf entity (i.e., of a bfq_queue) change, and such a change lets
    the entity become the next-to-serve entity for its parent entity, then
    the timestamps of the parent entity are recomputed as a function of
    the budget of its new next-to-serve leaf entity. If the parent entity
    belongs, in its turn, to a group, and its new timestamps let it become
    the next-to-serve for its parent entity, then the timestamps of the
    latter parent entity are recomputed as well, and so on. When a new
    bfq_queue must be set in service, the reverse path is followed: the
    next-to-serve highest-level entity is chosen, then its next-to-serve
    child entity, and so on, until the next-to-serve leaf entity is
    reached, and the bfq_queue that this entity represents is set in
    service.

    Writeback is accounted for on a per-group basis, i.e., for each group,
    the async I/O requests of the processes of the group are enqueued in a
    distinct bfq_queue, and the entity associated with this queue is a
    child of the entity associated with the group.

    Weights can be assigned explicitly to groups and processes through the
    cgroups interface, differently from what happens, for single
    processes, if the cgroups interface is not used (as explained in the
    description of the previous patch). In particular, since each node has
    a full scheduler, each group can be assigned its own weight.

    Signed-off-by: Fabio Checconi
    Signed-off-by: Paolo Valente
    Signed-off-by: Arianna Avanzini
    Signed-off-by: Jens Axboe

    Arianna Avanzini
     
  • We tag as v0 the version of BFQ containing only BFQ's engine plus
    hierarchical support. BFQ's engine is introduced by this commit, while
    hierarchical support is added by next commit. We use the v0 tag to
    distinguish this minimal version of BFQ from the versions containing
    also the features and the improvements added by next commits. BFQ-v0
    coincides with the version of BFQ submitted a few years ago [1], apart
    from the introduction of preemption, described below.

    BFQ is a proportional-share I/O scheduler, whose general structure,
    plus a lot of code, are borrowed from CFQ.

    - Each process doing I/O on a device is associated with a weight and a
    (bfq_)queue.

    - BFQ grants exclusive access to the device, for a while, to one queue
    (process) at a time, and implements this service model by
    associating every queue with a budget, measured in number of
    sectors.

    - After a queue is granted access to the device, the budget of the
    queue is decremented, on each request dispatch, by the size of the
    request.

    - The in-service queue is expired, i.e., its service is suspended,
    only if one of the following events occurs: 1) the queue finishes
    its budget, 2) the queue empties, 3) a "budget timeout" fires.

    - The budget timeout prevents processes doing random I/O from
    holding the device for too long and dramatically reducing
    throughput.

    - Actually, as in CFQ, a queue associated with a process issuing
    sync requests may not be expired immediately when it empties. In
    contrast, BFQ may idle the device for a short time interval,
    giving the process the chance to go on being served if it issues
    a new request in time. Device idling typically boosts the
    throughput on rotational devices, if processes do synchronous
    and sequential I/O. In addition, under BFQ, device idling is
    also instrumental in guaranteeing the desired throughput
    fraction to processes issuing sync requests (see [2] for
    details).

    - With respect to idling for service guarantees, if several
    processes are competing for the device at the same time, but
    all processes (and groups, after the following commit) have
    the same weight, then BFQ guarantees the expected throughput
    distribution without ever idling the device. Throughput is
    thus as high as possible in this common scenario.

    - Queues are scheduled according to a variant of WF2Q+, named
    B-WF2Q+, and implemented using an augmented rb-tree to preserve an
    O(log N) overall complexity. See [2] for more details. B-WF2Q+ is
    also ready for hierarchical scheduling. However, for a cleaner
    logical breakdown, the code that enables and completes
    hierarchical support is provided in the next commit, which focuses
    exactly on this feature.

    - B-WF2Q+ guarantees a tight deviation with respect to an ideal,
    perfectly fair, and smooth service. In particular, B-WF2Q+
    guarantees that each queue receives a fraction of the device
    throughput proportional to its weight, even if the throughput
    fluctuates, and regardless of: the device parameters, the current
    workload and the budgets assigned to the queue.

    - The last, budget-independence, property (although probably
    counterintuitive in the first place) is definitely beneficial, for
    the following reasons:

    - First, with any proportional-share scheduler, the maximum
    deviation with respect to an ideal service is proportional to
    the maximum budget (slice) assigned to queues. As a consequence,
    BFQ can keep this deviation tight not only because of the
    accurate service of B-WF2Q+, but also because BFQ *does not*
    need to assign a larger budget to a queue to let the queue
    receive a higher fraction of the device throughput.

    - Second, BFQ is free to choose, for every process (queue), the
    budget that best fits the needs of the process, or best
    leverages the I/O pattern of the process. In particular, BFQ
    updates queue budgets with a simple feedback-loop algorithm that
    allows a high throughput to be achieved, while still providing
    tight latency guarantees to time-sensitive applications. When
    the in-service queue expires, this algorithm computes the next
    budget of the queue so as to:

    - Let large budgets be eventually assigned to the queues
    associated with I/O-bound applications performing sequential
    I/O: in fact, the longer these applications are served once
    got access to the device, the higher the throughput is.

    - Let small budgets be eventually assigned to the queues
    associated with time-sensitive applications (which typically
    perform sporadic and short I/O), because, the smaller the
    budget assigned to a queue waiting for service is, the sooner
    B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).

    - Weights can be assigned to processes only indirectly, through I/O
    priorities, and according to the relation:
    weight = 10 * (IOPRIO_BE_NR - ioprio).
    The next patch provides, instead, a cgroups interface through which
    weights can be assigned explicitly.

    - If several processes are competing for the device at the same time,
    but all processes and groups have the same weight, then BFQ
    guarantees the expected throughput distribution without ever idling
    the device. It uses preemption instead. Throughput is then much
    higher in this common scenario.

    - ioprio classes are served in strict priority order, i.e.,
    lower-priority queues are not served as long as there are
    higher-priority queues. Among queues in the same class, the
    bandwidth is distributed in proportion to the weight of each
    queue. A very thin extra bandwidth is however guaranteed to the Idle
    class, to prevent it from starving.

    - If the strict_guarantees parameter is set (default: unset), then BFQ
    - always performs idling when the in-service queue becomes empty;
    - forces the device to serve one I/O request at a time, by
    dispatching a new request only if there is no outstanding
    request.
    In the presence of differentiated weights or I/O-request sizes,
    both the above conditions are needed to guarantee that every
    queue receives its allotted share of the bandwidth (see
    Documentation/block/bfq-iosched.txt for more details). Setting
    strict_guarantees may evidently affect throughput.

    [1] https://lkml.org/lkml/2008/4/1/234
    https://lkml.org/lkml/2008/11/11/148

    [2] P. Valente and M. Andreolini, "Improving Application
    Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
    the 5th Annual International Systems and Storage Conference
    (SYSTOR '12), June 2012.
    Slightly extended version:
    http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-
    results.pdf

    Signed-off-by: Fabio Checconi
    Signed-off-by: Paolo Valente
    Signed-off-by: Arianna Avanzini
    Signed-off-by: Jens Axboe

    Paolo Valente
     

15 Apr, 2017

1 commit

  • The Kyber I/O scheduler is an I/O scheduler for fast devices designed to
    scale to multiple queues. Users configure only two knobs, the target
    read and synchronous write latencies, and the scheduler tunes itself to
    achieve that latency goal.

    The implementation is based on "tokens", built on top of the scalable
    bitmap library. Tokens serve as a mechanism for limiting requests. There
    are two tiers of tokens: queueing tokens and dispatch tokens.

    A queueing token is required to allocate a request. In fact, these
    tokens are actually the blk-mq internal scheduler tags, but the
    scheduler manages the allocation directly in order to implement its
    policy.

    Dispatch tokens are device-wide and split up into two scheduling
    domains: reads vs. writes. Each hardware queue dispatches batches
    round-robin between the scheduling domains as long as tokens are
    available for that domain.

    These tokens can be used as the mechanism to enable various policies.
    The policy Kyber uses is inspired by active queue management techniques
    for network routing, similar to blk-wbt. The scheduler monitors
    latencies and scales the number of dispatch tokens accordingly. Queueing
    tokens are used to prevent starvation of synchronous requests by
    asynchronous requests.

    Various extensions are possible, including better heuristics and ionice
    support. The new scheduler isn't set as the default yet.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

09 Apr, 2017

1 commit


28 Mar, 2017

1 commit

  • throtl_slice is important for blk-throttling. It's called slice
    internally but it really is a time window blk-throttling samples data.
    blk-throttling will make decision based on the samplings. An example is
    bandwidth measurement. A cgroup's bandwidth is measured in the time
    interval of throtl_slice.

    A small throtl_slice meanse cgroups have smoother throughput but burn
    more CPUs. It has 100ms default value, which is not appropriate for all
    disks. A fast SSD can dispatch a lot of IOs in 100ms. This patch makes
    it tunable.

    Since throtl_slice isn't a time slice, the sysfs name
    'throttle_sample_time' reflects its character better.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

23 Feb, 2017

1 commit

  • Pull documentation updates from Jonathan Corbet:
    "A slightly quieter cycle for documentation this time around.

    Three more DocBook template files have been converted to RST; only 21
    to go. There are various build improvements and the usual array of
    documentation improvements and fixes"

    * tag 'docs-4.11' of git://git.lwn.net/linux: (44 commits)
    docs / driver-api: Fix structure references in device_link.rst
    PM / docs: Fix structure references in device.rst
    Add a target to check broken external links in the Documentation
    Documentation: Fix linux-api list typo
    Documentation: DocBook/Makefile comment typo
    Improve sparse documentation
    Documentation: make Makefile.sphinx no-ops quieter
    Documentation: DMA-ISA-LPC.txt
    Documentation: input: fix path to input code definitions
    docs: Remove the copyright year from conf.py
    docs: Fix a warning in the Korean HOWTO.rst translation
    PM / sleep / docs: Convert PM notifiers document to reST
    PM / core / docs: Convert sleep states API document to reST
    PM / core: Update kerneldoc comments in pm.h
    doc-rst: Fix recursive make invocation from macros
    doc-rst: Delete output of failed dot-SVG conversion
    doc-rst: Break shell command sequences on failure
    Documentation/sphinx: make targets independent of Sphinx work for HAVE_SPHINX=0
    doc-rst: fixed cleandoc target when used with O=dir
    Documentation/sphinx: prevent generation of .pyc files in the source tree
    ...

    Linus Torvalds
     

27 Jan, 2017

1 commit


04 Jan, 2017

1 commit


29 Nov, 2016

1 commit


18 Nov, 2016

1 commit


16 Nov, 2016

1 commit

  • If CONFIG_NVM is disabled, loading null_block module with use_lightnvm=1
    fails. But there are no messages and documents related to the failure.

    Add the appropriate error message.

    Signed-off-by: Yasuaki Ishimatsu

    Massaged the text a bit.

    Signed-off-by: Jens Axboe

    Yasuaki Ishimatsu
     

11 Nov, 2016

1 commit

  • Enable throttling of buffered writeback to make it a lot
    more smooth, and has way less impact on other system activity.
    Background writeback should be, by definition, background
    activity. The fact that we flush huge bundles of it at the time
    means that it potentially has heavy impacts on foreground workloads,
    which isn't ideal. We can't easily limit the sizes of writes that
    we do, since that would impact file system layout in the presence
    of delayed allocation. So just throttle back buffered writeback,
    unless someone is waiting for it.

    The algorithm for when to throttle takes its inspiration in the
    CoDel networking scheduling algorithm. Like CoDel, blk-wb monitors
    the minimum latencies of requests over a window of time. In that
    window of time, if the minimum latency of any request exceeds a
    given target, then a scale count is incremented and the queue depth
    is shrunk. The next monitoring window is shrunk accordingly. Unlike
    CoDel, if we hit a window that exhibits good behavior, then we
    simply increment the scale count and re-calculate the limits for that
    scale value. This prevents us from oscillating between a
    close-to-ideal value and max all the time, instead remaining in the
    windows where we get good behavior.

    Unlike CoDel, blk-wb allows the scale count to to negative. This
    happens if we primarily have writes going on. Unlike positive
    scale counts, this doesn't change the size of the monitoring window.
    When the heavy writers finish, blk-bw quickly snaps back to it's
    stable state of a zero scale count.

    The patch registers a sysfs entry, 'wb_lat_usec'. This sets the latency
    target to me met. It defaults to 2 msec for non-rotational storage, and
    75 msec for rotational storage. Setting this value to '0' disables
    blk-wb. Generally, a user would not have to touch this setting.

    We don't enable WBT on devices that are managed with CFQ, and have
    a non-root block cgroup attached. If we have a proportional share setup
    on this particular disk, then the wbt throttling will interfere with
    that. We don't have a strong need for wbt for that case, since we will
    rely on CFQ doing that for us.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Nov, 2016

1 commit

  • Noidle should be the default for writes as seen by all the compounds
    definitions in fs.h using it. In fact only direct I/O really should
    be using NODILE, so turn the whole flag around to get the defaults
    right, which will make our life much easier especially onces the
    WRITE_* defines go away.

    This assumes all the existing "raw" users of REQ_SYNC for writes
    want noidle behavior, which seems to be spot on from a quick audit.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

28 Oct, 2016

2 commits

  • Now that we don't need the common flags to overflow outside the range
    of a 32-bit type we can encode them the same way for both the bio and
    request fields. This in addition allows us to place the operation
    first (and make some room for more ops while we're at it) and to
    stop having to shift around the operation values.

    In addition this allows passing around only one value in the block layer
    instead of two (and eventuall also in the file systems, but we can do
    that later) and thus clean up a lot of code.

    Last but not least this allows decreasing the size of the cmd_flags
    field in struct request to 32-bits. Various functions passing this
    value could also be updated, but I'd like to avoid the churn for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • A lot of the REQ_* flags are only used on struct requests, and only of
    use to the block layer and a few drivers that dig into struct request
    internals.

    This patch adds a new req_flags_t rq_flags field to struct request for
    them, and thus dramatically shrinks the number of common requests. It
    also removes the unfortunate situation where we have to fit the fields
    from the same enum into 32 bits for struct bio and 64 bits for
    struct request.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Shaun Tancheff
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

14 Sep, 2016

1 commit

  • commit e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1
    "block: Do away with the notion of hardsect_size"
    removed the notion of "hardware sector size" from
    the kernel in favor of logical block size, but
    references remain in comments and documentation.

    Update the remaining sites mentioning hardsect.

    Signed-off-by: Linus Walleij
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Linus Walleij
     

11 Aug, 2016

1 commit


08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

29 Jul, 2016

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted cleanups and fixes.

    Probably the most interesting part long-term is ->d_init() - that will
    have a bunch of followups in (at least) ceph and lustre, but we'll
    need to sort the barrier-related rules before it can get used for
    really non-trivial stuff.

    Another fun thing is the merge of ->d_iput() callers (dentry_iput()
    and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
    except the one in __d_lookup_lru())"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    fs/dcache.c: avoid soft-lockup in dput()
    vfs: new d_init method
    vfs: Update lookup_dcache() comment
    bdev: get rid of ->bd_inodes
    Remove last traces of ->sync_page
    new helper: d_same_name()
    dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
    vfs: clean up documentation
    vfs: document ->d_real()
    vfs: merge .d_select_inode() into .d_real()
    unify dentry_iput() and dentry_unlink_inode()
    binfmt_misc: ->s_root is not going anywhere
    drop redundant ->owner initializations
    ufs: get rid of redundant checks
    orangefs: constify inode_operations
    missed comment updates from ->direct_IO() prototype change
    file_inode(f)->i_mapping is f->f_mapping
    trim fsnotify hooks a bit
    9p: new helper - v9fs_parent_fid()
    debugfs: ->d_parent is never NULL or negative
    ...

    Linus Torvalds
     

01 Jul, 2016

1 commit


29 Jun, 2016

1 commit


08 Jun, 2016

2 commits

  • To avoid confusion between REQ_OP_FLUSH, which is handled by
    request_fn drivers, and upper layers requesting the block layer
    perform a flush sequence along with possibly a WRITE, this patch
    renames REQ_FLUSH to REQ_PREFLUSH.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This adds a REQ_OP_FLUSH operation that is sent to request_fn
    based drivers by the block layer's flush code, instead of
    sending requests with the request->cmd_flags REQ_FLUSH bit set.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

20 May, 2016

1 commit

  • Pull Documentation updates from Jon Corbet:
    "A bit busier this time around.

    The most interesting thing (IMO) this time around is some beginning
    infrastructural work to allow documents to be written using
    restructured text. Maybe someday, in a galaxy far far away, we'll be
    able to eliminate the DocBook dependency and have a much better
    integrated set of kernel docs. Someday.

    Beyond that, there's a new document on security hardening from Kees,
    the movement of some sample code over to samples/, a number of
    improvements to the serial docs from Geert, and the usual collection
    of corrections, typo fixes, etc"

    * tag 'docs-for-linus' of git://git.lwn.net/linux: (55 commits)
    doc: self-protection: provide initial details
    serial: doc: Use port->state instead of info
    serial: doc: Always refer to tty_port->mutex
    Documentation: vm: Spelling s/paltform/platform/g
    Documentation/memcg: update kmem limit doc as codes behavior
    docproc: print a comment about autogeneration for rst output
    docproc: add support for reStructuredText format via --rst option
    docproc: abstract terminating lines at first space
    docproc: abstract docproc directive detection
    docproc: reduce unnecessary indentation
    docproc: add variables for subcommand and filename
    kernel-doc: use rst C domain directives and references for types
    kernel-doc: produce RestructuredText output
    kernel-doc: rewrite usage description, remove duplicated comments
    Doc: correct the location of sysrq.c
    Documentation: fix common spelling mistakes
    samples: v4l: from Documentation to samples directory
    samples: connector: from Documentation to samples directory
    Documentation: xillybus: fix spelling mistake
    Documentation: x86: fix spelling mistakes
    ...

    Linus Torvalds
     

13 Apr, 2016

2 commits

  • We don't have any drivers left using it, so kill it off. Update
    documentation to use the newer blk_queue_write_cache().

    Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Add an internal helper and flag for setting whether a queue has
    write back caching, or write through (or none). Add a sysfs file
    to show this as well, and make it changeable from user space.

    This will replace the (awkward) blk_queue_flush() interface that
    drivers currently use to inform the block layer of write cache state
    and capabilities.

    Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     

31 Mar, 2016

1 commit


18 Jan, 2016

1 commit

  • Pull documentation updates from Jon Corbet:
    "A relatively boring cycle in the docs tree. There's a few kernel-doc
    fixes and various document tweaks.

    One patch reaches out of the documentation subtree to fix a comment in
    init/do_mounts_rd.c. There didn't seem to be anybody more appropriate
    to take that one, so I accepted it"

    * tag 'docs-4.5' of git://git.lwn.net/linux: (29 commits)
    thermal: add description for integral_cutoff unit
    Documentation: update libhugetlbfs site url
    Documentation: Explain pci=conf1,conf2 more verbosely
    DMA-API: fix confusing sentence in Documentation/DMA-API.txt
    Documentation: translations: update linux cross reference link
    Documentation: fix typo in CodingStyle
    init, Documentation: Remove ramdisk_blocksize mentions
    Documentation-getdelays: Apply a recommendation from "checkpatch.pl" in main()
    Documentation: HOWTO: update versions from 3.x to 4.x
    Documentation: remove outdated references from translations
    Doc: treewide: Fix grammar "a" to "an"
    Documentation: cpu-hotplug: Fix sysfs mount instructions
    can-doc: Add hint about getting timestamps
    Fix CFQ I/O scheduler parameter name in documentation
    Documentation: arm: remove dead links from Marvell Berlin docs
    Documentation: HOWTO: update code cross reference link
    Doc: Docbook/iio: Fix typo in iio.tmpl
    DocBook: make index.html generation less verbose by default
    DocBook: Cleanup: remove an unused $(call) line
    DocBook: Add a help message for DOCBOOKS env var
    ...

    Linus Torvalds
     

11 Dec, 2015

1 commit


17 Nov, 2015

1 commit

  • Add support for registering as a LightNVM device. This allows us to
    evaluate the performance of the LightNVM subsystem.

    In /drivers/Makefile, LightNVM is moved above block device drivers
    to make sure that the LightNVM media managers have been initialized
    before drivers under /drivers/block are initialized.

    Signed-off-by: Matias Bjørling
    Fix by Jens Axboe to remove unneeded slab cache and the following
    memory leak.
    Signed-off-by: Jens Axboe

    Matias Bjørling
     

22 Oct, 2015

1 commit

  • This commits adds a driver API and ioctls for controlling Persistent
    Reservations s/genericly/generically/ at the block layer. Persistent
    Reservations are supported by SCSI and NVMe and allow controlling who gets
    access to a device in a shared storage setup.

    Note that we add a pr_ops structure to struct block_device_operations
    instead of adding the members directly to avoid bloating all instances
    of devices that will never support Persistent Reservations.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

14 Aug, 2015

1 commit


29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

17 Jul, 2015

1 commit

  • Lots of devices support huge discard sizes these days. Depending
    on how the device handles them internally, huge discards can
    introduce massive latencies (hundreds of msec) on the device side.

    We have a sysfs file, discard_max_bytes, that advertises the max
    hardware supported discard size. Make this writeable, and split
    the settings into a soft and hard limit. This can be set from
    'discard_granularity' and up to the hardware limit.

    Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw
    set limit.

    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jens Axboe