04 Oct, 2014

1 commit

  • Users of bio_clone_fast() do not want bios with their own bvecs.
    Allocating a bvec mempool as part of the bioset intended for such users
    is a waste of memory.

    bioset_create_nobvec() creates a bioset that doesn't have the bvec
    mempool.

    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Junichi Nomura
     

02 Aug, 2014

1 commit

  • Various subsystems can ask the bio subsystem to create a bio slab cache
    with some free space before the bio. This free space can be used for any
    purpose. Device mapper uses this per-bio-data feature to place some
    target-specific and device-mapper specific data before the bio, so that
    the target-specific data doesn't have to be allocated separately.

    This per-bio-data mechanism is used in place of kmalloc, so we need the
    allocated slab to have the same memory alignment as memory allocated
    with kmalloc.

    Change bio_find_or_create_slab() so that it uses ARCH_KMALLOC_MINALIGN
    alignment when creating the slab cache. This is needed so that dm-crypt
    can use per-bio-data for encryption - the crypto subsystem assumes this
    data will have the same alignment as kmalloc'ed memory.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Acked-by: Jens Axboe

    Mikulas Patocka
     

25 Jun, 2014

1 commit

  • Another restriction inherited for NVMe - those devices don't support
    SG lists that have "gaps" in them. Gaps refers to cases where the
    previous SG entry doesn't end on a page boundary. For NVMe, all SG
    entries must start at offset 0 (except the first) and end on a page
    boundary (except the last).

    Signed-off-by: Jens Axboe

    Jens Axboe
     

11 Jun, 2014

2 commits

  • Pull block layer fixes from Jens Axboe:
    "Final small batch of fixes to be included before -rc1. Some general
    cleanups in here as well, but some of the blk-mq fixes we need for the
    NVMe conversion and/or scsi-mq. The pull request contains:

    - Support for not merging across a specified "chunk size", if set by
    the driver. Some NVMe devices perform poorly for IO that crosses
    such a chunk, so we need to support it generically as part of
    request merging avoid having to do complicated split logic. From
    me.

    - Bump max tag depth to 10Ki tags. Some scsi devices have a huge
    shared tag space. Before we failed with EINVAL if a too large tag
    depth was specified, now we truncate it and pass back the actual
    value. From me.

    - Various blk-mq rq init fixes from me and others.

    - A fix for enter on a dying queue for blk-mq from Keith. This is
    needed to prevent oopsing on hot device removal.

    - Fixup for blk-mq timer addition from Ming Lei.

    - Small round of performance fixes for mtip32xx from Sam Bradshaw.

    - Minor stack leak fix from Rickard Strandqvist.

    - Two __init annotations from Fabian Frederick"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: add __init to blkcg_policy_register
    block: add __init to elv_register
    block: ensure that bio_add_page() always accepts a page for an empty bio
    blk-mq: add timer in blk_mq_start_request
    blk-mq: always initialize request->start_time
    block: blk-exec.c: Cleaning up local variable address returnd
    mtip32xx: minor performance enhancements
    blk-mq: ->timeout should be cleared in blk_mq_rq_ctx_init()
    blk-mq: don't allow queue entering for a dying queue
    blk-mq: bump max tag depth to 10K tags
    block: add blk_rq_set_block_pc()
    block: add notion of a chunk size for request merging

    Linus Torvalds
     
  • With commit 762380ad9322 added support for chunk sizes and no merging
    across them, it broke the rule of always allowing adding of a single
    page to an empty bio. So relax the restriction a bit to allow for that,
    similarly to what we have always done.

    This fixes a crash with mkfs.xfs and 512b sector sizes on NVMe.

    Reported-by: Keith Busch
    Signed-off-by: Jens Axboe

    Jens Axboe
     

10 Jun, 2014

1 commit

  • Pull cgroup updates from Tejun Heo:
    "A lot of activities on cgroup side. Heavy restructuring including
    locking simplification took place to improve the code base and enable
    implementation of the unified hierarchy, which currently exists behind
    a __DEVEL__ mount option. The core support is mostly complete but
    individual controllers need further work. To explain the design and
    rationales of the the unified hierarchy

    Documentation/cgroups/unified-hierarchy.txt

    is added.

    Another notable change is css (cgroup_subsys_state - what each
    controller uses to identify and interact with a cgroup) iteration
    update. This is part of continuing updates on css object lifetime and
    visibility. cgroup started with reference count draining on removal
    way back and is now reaching a point where csses behave and are
    iterated like normal refcnted objects albeit with some complexities to
    allow distinguishing the state where they're being deleted. The css
    iteration update isn't taken advantage of yet but is planned to be
    used to simplify memcg significantly"

    * 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (77 commits)
    cgroup: disallow disabled controllers on the default hierarchy
    cgroup: don't destroy the default root
    cgroup: disallow debug controller on the default hierarchy
    cgroup: clean up MAINTAINERS entries
    cgroup: implement css_tryget()
    device_cgroup: use css_has_online_children() instead of has_children()
    cgroup: convert cgroup_has_live_children() into css_has_online_children()
    cgroup: use CSS_ONLINE instead of CGRP_DEAD
    cgroup: iterate cgroup_subsys_states directly
    cgroup: introduce CSS_RELEASED and reduce css iteration fallback window
    cgroup: move cgroup->serial_nr into cgroup_subsys_state
    cgroup: link all cgroup_subsys_states in their sibling lists
    cgroup: move cgroup->sibling and ->children into cgroup_subsys_state
    cgroup: remove cgroup->parent
    device_cgroup: remove direct access to cgroup->children
    memcg: update memcg_has_children() to use css_next_child()
    memcg: remove tasks/children test from mem_cgroup_force_empty()
    cgroup: remove css_parent()
    cgroup: skip refcnting on normal root csses and cgrp_dfl_root self css
    cgroup: use cgroup->self.refcnt for cgroup refcnting
    ...

    Linus Torvalds
     

06 Jun, 2014

1 commit

  • Some drivers have different limits on what size a request should
    optimally be, depending on the offset of the request. Similar to
    dividing a device into chunks. Add a setting that allows the driver
    to inform the block layer of such a chunk size. The block layer will
    then prevent merging across the chunks.

    This is needed to optimally support NVMe with a non-zero stripe size.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

19 May, 2014

1 commit