12 Feb, 2019

3 commits

  • Since I can remember DM has forced the block layer to allow the
    allocation and initialization of the request_queue to be distinct
    operations. Reason for this is block/genhd.c:add_disk() has requires
    that the request_queue (and associated bdi) be tied to the gendisk
    before add_disk() is called -- because add_disk() also deals with
    exposing the request_queue via blk_register_queue().

    DM's dynamic creation of arbitrary device types (and associated
    request_queue types) requires the DM device's gendisk be available so
    that DM table loads can establish a master/slave relationship with
    subordinate devices that are referenced by loaded DM tables -- using
    bd_link_disk_holder(). But until these DM tables, and their associated
    subordinate devices, are known DM cannot know what type of request_queue
    it needs -- nor what its queue_limits should be.

    This chicken and egg scenario has created all manner of problems for DM
    and, at times, the block layer.

    Summary of changes:

    - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant
    that drivers may use to add a disk without also calling
    blk_register_queue(). Driver must call blk_register_queue() once its
    request_queue is fully initialized.

    - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED
    is not set. It won't be set if driver used add_disk_no_queue_reg()
    but driver encounters an error and must del_gendisk() before calling
    blk_register_queue().

    - Export blk_register_queue().

    These changes allow DM to use add_disk_no_queue_reg() to anchor its
    gendisk as the "master" for master/slave relationships DM must establish
    with subordinate devices referenced in DM tables that get loaded. Once
    all "slave" devices for a DM device are known its request_queue can be
    properly initialized and then advertised via sysfs -- important
    improvement being that no request_queue resource initialization
    performed by blk_register_queue() is missed for DM devices anymore.

    Signed-off-by: Mike Snitzer
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe
    (cherry picked from commit fa70d2e2c4a0a54ced98260c6a176cc94c876d27)

    Mike Snitzer
     
  • With this flag a driver can create a gendisk that can be used for I/O
    submission inside the kernel, but which is not registered as user
    facing block device. This will be useful for the NVMe multipath
    implementation.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    (cherry picked from commit 8ddcd653257c18a669fcb75ee42c37054908e0d6)

    Christoph Hellwig
     
  • The hidden gendisks introduced in the next patch need to keep the dev
    field in their struct device empty so that udev won't try to create
    block device nodes for them. To support that rewrite disk_devt to
    look at the major and first_minor fields in the gendisk itself instead
    of looking into the struct device.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe
    (cherry picked from commit 517bf3c306bad4fe0da631f90ae2ec40924dee2b)

    Christoph Hellwig
     

21 Jun, 2018

1 commit

  • [ Upstream commit bf0ddaba65ddbb2715af97041da8e7a45b2d8628 ]

    When the blk-mq inflight implementation was added, /proc/diskstats was
    converted to use it, but /sys/block/$dev/inflight was not. Fix it by
    adding another helper to count in-flight requests by data direction.

    Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant")
    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Omar Sandoval
     

30 Nov, 2017

1 commit

  • commit 67f2519fe2903c4041c0e94394d14d372fe51399 upstream.

    guard_bio_eod() needs to look at the partition capacity, not just the
    capacity of the whole device, when determining if truncation is
    necessary.

    [ 60.268688] attempt to access beyond end of device
    [ 60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
    [ 60.268693] buffer_io_error: 2 callbacks suppressed
    [ 60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read

    Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Greg Edwards
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Greg Edwards
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

10 Aug, 2017

3 commits

  • We don't have to inc/dec some counter, since we can just
    iterate the tags. That makes inc/dec a noop, but means we
    have to iterate busy tags to get an in-flight count.

    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Instead of returning the count that matches the partition, pass
    in an array of two ints. Index 0 will be filled with the inflight
    count for the partition in question, and index 1 will filled
    with the root inflight count, if the partition passed in is not the
    root.

    This is in preparation for being able to calculate both in one
    go.

    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • No functional change in this patch, just in preparation for
    basing the inflight mechanism on the queue in question.

    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Jun, 2017

1 commit

  • This helper was only used by IMA of all things, which would get spurious
    errors if CONFIG_BLOCK is disabled. Just opencode the call there.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Amir Goldstein
    Acked-by: Mimi Zohar
    Reviewed-by: Andy Shevchenko

    Christoph Hellwig
     

22 Apr, 2017

1 commit

  • Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
    introduced blk_integrity_revalidate(), which seems to assume ownership
    of the stable pages flag and unilaterally clears it if no blk_integrity
    profile is registered:

    if (bi->profile)
    disk->queue->backing_dev_info->capabilities |=
    BDI_CAP_STABLE_WRITES;
    else
    disk->queue->backing_dev_info->capabilities &=
    ~BDI_CAP_STABLE_WRITES;

    It's called from revalidate_disk() and rescan_partitions(), making it
    impossible to enable stable pages for drivers that support partitions
    and don't use blk_integrity: while the call in revalidate_disk() can be
    trivially worked around (see zram, which doesn't support partitions and
    hence gets away with zram_revalidate_disk()), rescan_partitions() can
    be triggered from userspace at any time. This breaks rbd, where the
    ceph messenger is responsible for generating/verifying CRCs.

    Since blk_integrity_{un,}register() "must" be used for (un)registering
    the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
    setting there. This way drivers that call blk_integrity_register() and
    use integrity infrastructure won't interfere with drivers that don't
    but still want stable pages.

    Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
    Cc: "Martin K. Petersen"
    Cc: Christoph Hellwig
    Cc: Mike Snitzer
    Cc: stable@vger.kernel.org # 4.4+, needs backporting
    Tested-by: Dan Williams
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Jens Axboe

    Ilya Dryomov
     

25 Mar, 2017

1 commit


09 Mar, 2017

1 commit

  • This reverts commit 0dba1314d4f81115dce711292ec7981d17231064. It causes
    leaking of device numbers for SCSI when SCSI registers multiple gendisks
    for one request_queue in succession. It can be easily reproduced using
    Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
    Furthermore the protection provided by this commit is not needed anymore
    as the problem it was fixing got also fixed by commit 165a5e22fafb
    "block: Move bdi_unregister() to del_gendisk()".

    [1]: http://marc.info/?l=linux-block&m=148554717109098&w=2

    Signed-off-by: Jan Kara
    Acked-by: Dan Williams
    Tested-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jan Kara
     

02 Feb, 2017

1 commit

  • Warnings of the following form occur because scsi reuses a devt number
    while the block layer still has it referenced as the name of the bdi
    [1]:

    WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
    sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192'
    [..]
    Call Trace:
    dump_stack+0x86/0xc3
    __warn+0xcb/0xf0
    warn_slowpath_fmt+0x5f/0x80
    ? kernfs_path_from_node+0x4f/0x60
    sysfs_warn_dup+0x62/0x80
    sysfs_create_dir_ns+0x77/0x90
    kobject_add_internal+0xb2/0x350
    kobject_add+0x75/0xd0
    device_add+0x15a/0x650
    device_create_groups_vargs+0xe0/0xf0
    device_create_vargs+0x1c/0x20
    bdi_register+0x90/0x240
    ? lockdep_init_map+0x57/0x200
    bdi_register_owner+0x36/0x60
    device_add_disk+0x1bb/0x4e0
    ? __pm_runtime_use_autosuspend+0x5c/0x70
    sd_probe_async+0x10d/0x1c0
    async_run_entry_fn+0x39/0x170

    This is a brute-force fix to pass the devt release information from
    sd_probe() to the locations where we register the bdi,
    device_add_disk(), and unregister the bdi, blk_cleanup_queue().

    Thanks to Omar for the quick reproducer script [2]. This patch survives
    where an unmodified kernel fails in a few seconds.

    [1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4
    [2]: http://marc.info/?l=linux-block&m=148554717109098&w=2

    Cc: James Bottomley
    Cc: Bart Van Assche
    Cc: "Martin K. Petersen"
    Cc: Jan Kara
    Reported-by: Omar Sandoval
    Tested-by: Omar Sandoval
    Signed-off-by: Dan Williams
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Dan Williams
     

23 Dec, 2016

1 commit


11 Oct, 2016

1 commit

  • The __latent_entropy gcc attribute can be used only on functions and
    variables. If it is on a function then the plugin will instrument it for
    gathering control-flow entropy. If the attribute is on a variable then
    the plugin will initialize it with random contents. The variable must
    be an integer, an integer array type or a structure with integer fields.

    These specific functions have been selected because they are init
    functions (to help gather boot-time entropy), are called at unpredictable
    times, or they have variable loops, each of which provide some level of
    latent entropy.

    Signed-off-by: Emese Revfy
    [kees: expanded commit message]
    Signed-off-by: Kees Cook

    Emese Revfy
     

28 Jun, 2016

1 commit

  • Now that all drivers that specify a ->driverfs_dev have been converted
    to device_add_disk(), the pointer can be removed from struct gendisk.

    Cc: Jens Axboe
    Cc: Ross Zwisler
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     

16 Jun, 2016

1 commit

  • In preparation for removing the ->driverfs_dev member of a gendisk, add
    an api that takes the parent device as a parameter to add_disk(). For
    now this maintains the status quo of WARN()ing on failure, but not
    return a error code.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Bart Van Assche
    Signed-off-by: Dan Williams

    Dan Williams
     

21 May, 2016

1 commit

  • UUID library provides uuid_be type and uuid_be_to_bin() function. This
    substitutes open coded variant by generic library calls.

    Signed-off-by: Andy Shevchenko
    Reviewed-by: Matt Fleming
    Cc: Dmitry Kasatkin
    Cc: Mimi Zohar
    Cc: Rasmus Villemoes
    Cc: Arnd Bergmann
    Cc: "Theodore Ts'o"
    Cc: Al Viro
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

10 Jan, 2016

2 commits

  • These actions are completely managed by a block driver or can use the
    badblocks api directly.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • NVDIMM devices, which can behave more like DRAM rather than block
    devices, may develop bad cache lines, or 'poison'. A block device
    exposed by the pmem driver can then consume poison via a read (or
    write), and cause a machine check. On platforms without machine
    check recovery features, this would mean a crash.

    The block device maintaining a runtime list of all known sectors that
    have poison can directly avoid this, and also provide a path forward
    to enable proper handling/recovery for DAX faults on such a device.

    Use the new badblock management interfaces to add a badblocks list to
    gendisks.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

22 Oct, 2015

3 commits

  • A trace like the following proceeds a crash in bio_integrity_process()
    when it goes to use an already freed blk_integrity profile.

    BUG: unable to handle kernel paging request at ffff8800d31b10d8
    IP: [] 0xffff8800d31b10d8
    PGD 2f65067 PUD 21fffd067 PMD 80000000d30001e3
    Oops: 0011 [#1] SMP
    Dumping ftrace buffer:
    ---------------------------------
    ndctl-2222 2.... 44526245us : disk_release: pmem1s
    systemd--2223 4.... 44573945us : bio_integrity_endio: pmem1s
    -409 4.... 44574005us : bio_integrity_process: pmem1s
    ---------------------------------
    [..]
    Call Trace:
    [] ? bio_integrity_process+0x159/0x2d0
    [] bio_integrity_verify_fn+0x36/0x60
    [] process_one_work+0x1cc/0x4e0

    Given that a request_queue is pinned while i/o is in flight and that a
    gendisk is allowed to have a shorter lifetime, move blk_integrity to
    request_queue to satisfy requests arriving after the gendisk has been
    torn down.

    Cc: Christoph Hellwig
    Cc: Martin K. Petersen
    [martin: fix the CONFIG_BLK_DEV_INTEGRITY=n case]
    Tested-by: Ross Zwisler
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Dan Williams
     
  • Up until now the_integrity profile has been dynamically allocated and
    attached to struct gendisk after the disk has been made active.

    This causes problems because NVMe devices need to register the profile
    prior to the partition table being read due to a mandatory metadata
    buffer requirement. In addition, DM goes through hoops to deal with
    preallocating, but not initializing integrity profiles.

    Since the integrity profile is small (4 bytes + a pointer), Christoph
    suggested moving it to struct gendisk proper. This requires several
    changes:

    - Moving the blk_integrity definition to genhd.h.

    - Inlining blk_integrity in struct gendisk.

    - Removing the dynamic allocation code.

    - Adding helper functions which allow gendisk to set up and tear down
    the integrity sysfs dir when a disk is added/deleted.

    - Adding a blk_integrity_revalidate() callback for updating the stable
    pages bdi setting.

    - The calls that depend on whether a device has an integrity profile or
    not now key off of the bi->profile pointer.

    - Simplifying the integrity support routines in DM (Mike Snitzer).

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Mike Snitzer
    Cc: Dan Williams
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • The integrity kobject purely exists to support the integrity
    subdirectory in sysfs and doesn't really have anything to do with the
    blk_integrity data structure. Move the kobject to struct gendisk where
    it belongs.

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

17 Jul, 2015

2 commits


18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

26 Feb, 2013

1 commit

  • Commit "85865c1 ima: add policy support for file system uuid"
    introduced a CONFIG_BLOCK dependency. This patch defines a
    wrapper called blk_part_pack_uuid(), which returns -EINVAL,
    when CONFIG_BLOCK is not defined.

    security/integrity/ima/ima_policy.c:538:4: error: implicit declaration
    of function 'part_pack_uuid' [-Werror=implicit-function-declaration]

    Changelog v2:
    - Reference commit number in patch description
    Changelog v1:
    - rename ima_part_pack_uuid() to blk_part_pack_uuid()
    - resolve scripts/checkpatch.pl warnings
    Changelog v0:
    - fix UUID scripts/Lindent msgs

    Reported-by: Randy Dunlap
    Reported-by: David Rientjes
    Signed-off-by: Mimi Zohar
    Acked-by: David Rientjes
    Acked-by: Randy Dunlap
    Cc: Jens Axboe
    Signed-off-by: James Morris

    Mimi Zohar
     

23 Nov, 2012

1 commit

  • This will allow other types of UUID to be stored here, aside from true
    UUIDs. This also simplifies code that uses this field, since it's usually
    constructed from a, used as a, or compared to other, strings.

    Note: A simplistic approach here would be to set uuid_str[36]=0 whenever a
    /PARTNROFF option was found to be present. However, this modifies the
    input string, and causes subsequent calls to devt_from_partuuid() not to
    see the /PARTNROFF option, which causes different results. In order to
    avoid misleading future maintainers, this parameter is marked const.

    Signed-off-by: Stephen Warren
    Cc: Tejun Heo
    Cc: Will Drewry
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Stephen Warren
     

02 Aug, 2012

1 commit

  • Pull core block IO bits from Jens Axboe:
    "The most complicated part if this is the request allocation rework by
    Tejun, which has been queued up for a long time and has been in
    for-next ditto as well.

    There are a few commits from yesterday and today, mostly trivial and
    obvious fixes. So I'm pretty confident that it is sound. It's also
    smaller than usual."

    * 'for-3.6/core' of git://git.kernel.dk/linux-block:
    block: remove dead func declaration
    block: add partition resize function to blkpg ioctl
    block: uninitialized ioc->nr_tasks triggers WARN_ON
    block: do not artificially constrain max_sectors for stacking drivers
    blkcg: implement per-blkg request allocation
    block: prepare for multiple request_lists
    block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv
    blkcg: inline bio_blkcg() and friends
    block: allocate io_context upfront
    block: refactor get_request[_wait]()
    block: drop custom queue draining used by scsi_transport_{iscsi|fc}
    mempool: add @gfp_mask to mempool_create_node()
    blkcg: make root blkcg allocation use %GFP_KERNEL
    blkcg: __blkg_lookup_create() doesn't need radix preload

    Linus Torvalds
     

01 Aug, 2012

1 commit

  • Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
    allows altering the size of an existing partition, even if it is currently
    in use.

    This patch converts hd_struct->nr_sects into sequence counter because
    One might extend a partition while IO is happening to it and update of
    nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
    can lead to issues like reading inconsistent size of a partition. Sequence
    counter have been used so that readers don't have to take bdev mutex lock
    as we call sector_in_part() very frequently.

    Now all the access to hd_struct->nr_sects should happen using sequence
    counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
    There is one exception though, set_capacity()/get_capacity(). I think
    theoritically race should exist there too but this patch does not
    modify set_capacity()/get_capacity() due to sheer number of call sites
    and I am afraid that change might break something. I have left that as a
    TODO item. We can handle it later if need be. This patch does not introduce
    any new races as such w.r.t set_capacity()/get_capacity().

    v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Phillip Susi
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

17 Jul, 2012

1 commit

  • This function is not really specific to the genhd layer and there are various
    re-implementations or open-coded variants of it all throughout the kernel. To
    avoid further duplications move the function to a more generic place.

    While moving also convert it from a macro to a inline function.

    Potential users of this function can be detected and converted using the
    following coccinelle patch:

    //
    @@
    expression k;
    @@
    -container_of(k, struct device, kobj)
    +kobj_to_dev(kobj)
    //

    Signed-off-by: Lars-Peter Clausen
    Signed-off-by: Greg Kroah-Hartman

    Lars-Peter Clausen
     

15 May, 2012

1 commit

  • 6d1d8050b4bc8 "block, partition: add partition_meta_info to hd_struct"
    added part_unpack_uuid() which assumes that the passed in buffer has
    enough space for sprintfing "%pU" - 37 characters including '\0'.

    Unfortunately, b5af921ec0233 "init: add support for root devices
    specified by partition UUID" supplied 33 bytes buffer to the function
    leading to the following panic with stackprotector enabled.

    Kernel panic - not syncing: stack-protector: Kernel stack corrupted in: ffffffff81b14c7e

    [] panic+0xba/0x1c6
    [] ? printk_all_partitions+0x259/0x26xb
    [] __stack_chk_fail+0x1b/0x20
    [] printk_all_paritions+0x259/0x26xb
    [] mount_block_root+0x1bc/0x27f
    [] mount_root+0x57/0x5b
    [] prepare_namespace+0x13d/0x176
    [] ? release_tgcred.isra.4+0x330/0x30
    [] kernel_init+0x155/0x15a
    [] ? schedule_tail+0x27/0xb0
    [] kernel_thread_helper+0x5/0x10
    [] ? start_kernel+0x3c5/0x3c5
    [] ? gs_change+0x13/0x13

    Increase the buffer size, remove the dangerous part_unpack_uuid() and
    use snprintf() directly from printk_all_partitions().

    Signed-off-by: Tejun Heo
    Reported-by: Szymon Gruszczynski
    Cc: Will Drewry
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Tejun Heo
     

02 Mar, 2012

1 commit

  • Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
    __blkdev_get() calls rescan_partitions() to remove
    in-kernel partition structures and raise KOBJ_CHANGE uevent.

    However it ends up calling driver's revalidate_disk without open
    and could cause oops.

    In the case of SCSI:

    process A process B
    ----------------------------------------------
    sys_open
    __blkdev_get
    sd_open
    returns -ENOMEDIUM
    scsi_remove_device

    rescan_partitions
    sd_revalidate_disk

    Oopses are reported here:
    http://marc.info/?l=linux-scsi&m=132388619710052

    This patch separates the partition invalidation from rescan_partitions()
    and use it for -ENOMEDIUM case.

    Reported-by: Huajun Li
    Signed-off-by: Jun'ichi Nomura
    Acked-by: Tejun Heo
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Jun'ichi Nomura
     

04 Jan, 2012

1 commit


10 Nov, 2011

1 commit

  • This reverts commit a72c5e5eb738033938ab30d6a634b74d1d060f10.

    The commit introduced alias for block devices which is intended to be
    used during logging although actual usage hasn't been committed yet.
    This approach adds very limited benefit (raw log might be easier to
    follow) which can be trivially implemented in userland but has a lot
    of problems.

    It is much worse than netif renames because it doesn't rename the
    actual device but just adds conveninence name which isn't used
    universally or enforced. Everything internal including device lookup
    and sysfs still uses the internal name and nothing prevents two
    devices from using conflicting alias - ie. sda can have sdb as its
    alias.

    This has been nacked by people working on device driver core, block
    layer and kernel-userland interface and shouldn't have been
    upstreamed. Revert it.

    http://thread.gmane.org/gmane.linux.kernel/1155104
    http://thread.gmane.org/gmane.linux.scsi/68632
    http://thread.gmane.org/gmane.linux.scsi/69776

    Signed-off-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Acked-by: Kay Sievers
    Cc: "James E.J. Bottomley"
    Cc: Nao Nishijima
    Cc: Alan Cox
    Cc: Al Viro
    Signed-off-by: Jens Axboe

    Tejun Heo
     

05 Nov, 2011

1 commit

  • * 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
    virtio-blk: use ida to allocate disk index
    hpsa: add small delay when using PCI Power Management to reset for kump
    cciss: add small delay when using PCI Power Management to reset for kump
    xen/blkback: Fix two races in the handling of barrier requests.
    xen/blkback: Check for proper operation.
    xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
    xen/blkback: Report VBD_WSECT (wr_sect) properly.
    xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
    xen-blkfront: plug device number leak in xlblk_init() error path
    xen-blkfront: If no barrier or flush is supported, use invalid operation.
    xen-blkback: use kzalloc() in favor of kmalloc()+memset()
    xen-blkback: fixed indentation and comments
    xen-blkfront: fix a deadlock while handling discard response
    xen-blkfront: Handle discard requests.
    xen-blkback: Implement discard requests ('feature-discard')
    xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
    drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
    drivers/block/loop.c: emit uevent on auto release
    drivers/block/cpqarray.c: use pci_dev->revision
    loop: always allow userspace partitions and optionally support automatic scanning
    ...

    Fic up trivial header file includsion conflict in drivers/block/loop.c

    Linus Torvalds
     

29 Aug, 2011

1 commit

  • This patch allows the user to set an "alias" of the disk via sysfs interface.

    This patch only adds a new attribute "alias" in gendisk structure.
    To show the alias instead of the device name in kernel messages,
    we need to revise printk messages and use alias_name() in them.

    Example:
    (current) printk("disk name is %s\n", disk->disk_name);
    (new) printk("disk name is %s\n", alias_name(disk));

    Users can use alphabets, numbers, '-' and '_' in "alias" attribute. A disk can
    have an "alias" which length is up to 255 bytes. This attribute is write-once.

    Suggested-by: James Bottomley
    Suggested-by: Jon Masters
    Signed-off-by: Nao Nishijima
    Signed-off-by: James Bottomley

    Nao Nishijima
     

24 Aug, 2011

1 commit

  • There are cases where suppressing partition scan is useful - e.g. for
    lo devices and pseudo SATA devices which advertise to be a disk but
    get upset on partition scan (some port multiplier control devices show
    such behavior).

    This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan
    regardless of the number of possible partitions. disk_partitionable()
    is renamed to disk_part_scan_enabled() as suppressing partition scan
    doesn't imply the device can't be partitioned using
    BLKPG_ADD/DEL_PARTITION calls from userland. show_partition() now
    directly tests disk_max_parts() to maintain backward-compatibility.

    -v2: Updated to make it clear that only partition scan is suppressed
    not partitioning itself as suggested by Kay Sievers.

    Signed-off-by: Tejun Heo
    Cc: Kay Sievers
    Signed-off-by: Jens Axboe

    Tejun Heo
     

01 Jul, 2011

1 commit

  • Currently, only open(2) is defined as the 'clearing' point. It has
    two roles - first, it's an acknowledgement from userland indicating
    that the event has been received and kernel can clear pending states
    and proceed to generate more events. Secondly, it's passed on to
    device drivers as a hint indicating that a synchronization point has
    been reached and it might want to take a deeper look at the device.

    The latter currently is only used by sr which uses two different
    mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY
    to discover events, where the former is lighter weight and safe to be
    used repeatedly but may not provide full coverage. Among other
    things, GET_EVENT can't detect media removal while TUR can.

    This patch makes close(2) - blkdev_put() - indicate clearing hint for
    MEDIA_CHANGE to drivers. disk_check_events() is renamed to
    disk_flush_events() and updated to take @mask for events to flush
    which is or'd to ev->clearing and will be passed to the driver on the
    next ->check_events() invocation.

    This change makes sr generate MEDIA_CHANGE when media is ejected from
    userland - e.g. with eject(1).

    Note: Given the current usage, it seems @clearing hint is needlessly
    complex. disk_clear_events() can simply clear all events and the hint
    can be boolean @flush.

    Signed-off-by: Tejun Heo
    Cc: Kay Sievers
    Signed-off-by: Jens Axboe

    Tejun Heo