09 Apr, 2017

1 commit


25 Dec, 2016

1 commit


26 Oct, 2016

2 commits

  • Discontinue having the brd driver destructively free all pages in the
    ramdisk in response to the BLKFLSBUF ioctl. Doing so allows a BLKFLSBUF
    ioctl issued to a logical partition to destroy pages of the parent brd
    device (and all other partitions of that brd device).

    This change breaks compatibility - but in this case the compatibility
    breaks more than it helps.

    Reported-by: Mikulas Patocka
    Suggested-by: Christoph Hellwig
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     
  • Currently rd_size was int which lead to overflow and bogus device size
    once the requested ramdisk size was 1 TB or more. Although these days
    ramdisks with 1 TB size are mostly a mistake, the days when they are
    useful are not far.

    Reported-by: Bart Van Assche
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

08 Aug, 2016

1 commit

  • Commit abf545484d31 changed it from an 'rw' flags type to the
    newer ops based interface, but now we're effectively leaking
    some bdev internals to the rest of the kernel. Since we only
    care about whether it's a read or a write at that level, just
    pass in a bool 'is_write' parameter instead.

    Then we can also move op_is_write() and friends back under
    CONFIG_BLOCK protection.

    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Aug, 2016

1 commit

  • The rw_page users were not converted to use bio/req ops. As a result
    bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
    be sent down as reads.

    Signed-off-by: Mike Christie
    Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code")

    Modified by me to:

    1) Drop op_flags passing into ->rw_page(), as we don't use it.
    2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK

    Signed-off-by: Jens Axboe

    Mike Christie
     

29 Jul, 2016

1 commit

  • Pull libnvdimm updates from Dan Williams:

    - Replace pcommit with ADR / directed-flushing.

    The pcommit instruction, which has not shipped on any product, is
    deprecated. Instead, the requirement is that platforms implement
    either ADR, or provide one or more flush addresses per nvdimm.

    ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
    to the memory controller on a power-fail event.

    Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
    Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
    A flush hint is an mmio address that when written and fenced assures
    that all previous posted writes targeting a given dimm have been
    flushed to media.

    - On-demand ARS (address range scrub).

    Linux uses the results of the ACPI ARS commands to track bad blocks
    in pmem devices. When latent errors are detected we re-scrub the
    media to refresh the bad block list, userspace can also request a
    re-scrub at any time.

    - Support for the Microsoft DSM (device specific method) command
    format.

    - Support for EDK2/OVMF virtual disk device memory ranges.

    - Various fixes and cleanups across the subsystem.

    * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
    libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
    nfit: do an ARS scrub on hitting a latent media error
    nfit: move to nfit/ sub-directory
    nfit, libnvdimm: allow an ARS scrub to be triggered on demand
    libnvdimm: register nvdimm_bus devices with an nd_bus driver
    pmem: clarify a debug print in pmem_clear_poison
    x86/insn: remove pcommit
    Revert "KVM: x86: add pcommit support"
    nfit, tools/testing/nvdimm/: unify shutdown paths
    libnvdimm: move ->module to struct nvdimm_bus_descriptor
    nfit: cleanup acpi_nfit_init calling convention
    nfit: fix _FIT evaluation memory leak + use after free
    tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
    tools/testing/nvdimm: add virtual ramdisk range
    acpi, nfit: treat virtual ramdisk SPA as pmem region
    pmem: kill __pmem address space
    pmem: kill wmb_pmem()
    libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
    fs/dax: remove wmb_pmem()
    libnvdimm, pmem: flush posted-write queues on shutdown
    ...

    Linus Torvalds
     

27 Jul, 2016

1 commit

  • Pull block driver updates from Jens Axboe:
    "This branch also contains core changes. I've come to the conclusion
    that from 4.9 and forward, I'll be doing just a single branch. We
    often have dependencies between core and drivers, and it's hard to
    always split them up appropriately without pulling core into drivers
    when that happens.

    That said, this contains:

    - separate secure erase type for the core block layer, from
    Christoph.

    - set of discard fixes, from Christoph.

    - bio shrinking fixes from Christoph, as a followup up to the
    op/flags change in the core branch.

    - map and append request fixes from Christoph.

    - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
    exciting!

    - nvme-loop fixes from Arnd.

    - removal of ->driverfs_dev from Dan, after providing a
    device_add_disk() helper.

    - bcache fixes from Bhaktipriya and Yijing.

    - cdrom subchannel read fix from Vchannaiah.

    - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

    - set of drbd updates and fixes from Fabian, Lars, and Philipp.

    - mg_disk error path fix from Bart.

    - user notification for failed device add for loop, from Minfei.

    - NVMe in general:
    + NVMe delay quirk from Guilherme.
    + SR-IOV support and command retry limits from Keith.
    + fix for memory-less NUMA node from Masayoshi.
    + use UINT_MAX for discard sectors, from Minfei.
    + cancel IO fixes from Ming.
    + don't allocate unused major, from Neil.
    + error code fixup from Dan.
    + use constants for PSDT/FUSE from James.
    + variable init fix from Jay.
    + fabrics fixes from Ming, Sagi, and Wei.
    + various fixes"

    * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
    nvme/pci: Provide SR-IOV support
    nvme: initialize variable before logical OR'ing it
    block: unexport various bio mapping helpers
    scsi/osd: open code blk_make_request
    target: stop using blk_make_request
    block: simplify and export blk_rq_append_bio
    block: ensure bios return from blk_get_request are properly initialized
    virtio_blk: use blk_rq_map_kern
    memstick: don't allow REQ_TYPE_BLOCK_PC requests
    block: shrink bio size again
    block: simplify and cleanup bvec pool handling
    block: get rid of bio_rw and READA
    block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
    block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
    NVMe: don't allocate unused nvme_major
    nvme: avoid crashes when node 0 is memoryless node.
    nvme: Limit command retries
    loop: Make user notify for adding loop device failed
    nvme-loop: fix nvme-loop Kconfig dependencies
    nvmet: fix return value check in nvmet_subsys_alloc()
    ...

    Linus Torvalds
     

21 Jul, 2016

2 commits

  • Currently, presence of direct_access() in block_device_operations
    indicates support of DAX on its block device. Because
    block_device_operations is instantiated with 'const', this DAX
    capablity may not be enabled conditinally.

    In preparation for supporting DAX to device-mapper devices, add
    QUEUE_FLAG_DAX to request_queue flags to advertise their DAX
    support. This will allow to set the DAX capability based on how
    mapped device is composed.

    Signed-off-by: Toshi Kani
    Acked-by: Dan Williams
    Signed-off-by: Mike Snitzer
    Cc: Jens Axboe
    Cc: Ross Zwisler
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc:
    Signed-off-by: Jens Axboe

    Toshi Kani
     
  • These two are confusing leftover of the old world order, combining
    values of the REQ_OP_ and REQ_ namespaces. For callers that don't
    special case we mostly just replace bi_rw with bio_data_dir or
    op_is_write, except for the few cases where a switch over the REQ_OP_
    values makes more sense. Any check for READA is replaced with an
    explicit check for REQ_RAHEAD. Also remove the READA alias for
    REQ_RAHEAD.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

13 Jul, 2016

1 commit

  • The __pmem address space was meant to annotate codepaths that touch
    persistent memory and need to coordinate a call to wmb_pmem(). Now that
    wmb_pmem() is gone, there is little need to keep this annotation.

    Cc: Christoph Hellwig
    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

08 Jun, 2016

1 commit

  • This patch converts the simple bi_rw use cases in the block,
    drivers, mm and fs code to set/get the bio operation using
    bio_set_op_attrs/bio_op

    These should be simple one or two liner cases, so I just did them
    in one patch. The next patches handle the more complicated
    cases in a module per patch.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

19 May, 2016

1 commit

  • 1/ If a mapping overlaps a bad sector fail the request.

    2/ Do not opportunistically report more dax-capable capacity than is
    requested when errors present.

    Reviewed-by: Jeff Moyer
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dan Williams
    [vishal: fix a conflict with system RAM collision patches]
    [vishal: add a 'size' parameter to ->direct_access]
    [vishal: fix a conflict with DAX alignment check patches]
    Signed-off-by: Vishal Verma

    Dan Williams
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

16 Mar, 2016

1 commit

  • Avoid that discard requests with size => PAGE_SIZE fail with
    -EIO. Refuse discard requests if the discard size is not a
    multiple of the page size.

    Fixes: 2dbe54957636 ("brd: Refuse improperly aligned discard requests")
    Signed-off-by: Bart Van Assche
    Reviewed-by: Jan Kara
    Cc: Christoph Hellwig
    Cc: Robert Elliot
    Cc: stable # v4.4+
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

16 Jan, 2016

1 commit

  • For the purpose of communicating the optional presence of a 'struct
    page' for the pfn returned from ->direct_access(), introduce a type that
    encapsulates a page-frame-number plus flags. These flags contain the
    historical "page_link" encoding for a scatterlist entry, but can also
    denote "device memory". Where "device memory" is a set of pfns that are
    not part of the kernel's linear mapping by default, but are accessed via
    the same memory controller as ram.

    The motivation for this new type is large capacity persistent memory
    that needs struct page entries in the 'memmap' to support 3rd party DMA
    (i.e. O_DIRECT I/O with a persistent memory source/target). However,
    we also need it in support of maintaining a list of mapped inodes which
    need to be unmapped at driver teardown or freeze_bdev() time.

    Signed-off-by: Dan Williams
    Cc: Christoph Hellwig
    Cc: Dave Hansen
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

12 Nov, 2015

1 commit


08 Nov, 2015

1 commit


09 Sep, 2015

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "This update has successfully completed a 0day-kbuild run and has
    appeared in a linux-next release. The changes outside of the typical
    drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
    removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
    the introduction of ZONE_DEVICE + devm_memremap_pages().

    Summary:

    - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
    mechanism for adding device-driver-discovered memory regions to the
    kernel's direct map.

    This facility is used by the pmem driver to enable pfn_to_page()
    operations on the page frames returned by DAX ('direct_access' in
    'struct block_device_operations').

    For now, the 'memmap' allocation for these "device" pages comes
    from "System RAM". Support for allocating the memmap from device
    memory will arrive in a later kernel.

    - Introduce memremap() to replace usages of ioremap_cache() and
    ioremap_wt(). memremap() drops the __iomem annotation for these
    mappings to memory that do not have i/o side effects. The
    replacement of ioremap_cache() with memremap() is limited to the
    pmem driver to ease merging the api change in v4.3.

    Completion of the conversion is targeted for v4.4.

    - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
    driver, update the VFS DAX implementation and PMEM api to provide
    persistence guarantees for kernel operations on a DAX mapping.

    - Convert the ACPI NFIT 'BLK' driver to map the block apertures as
    cacheable to improve performance.

    - Miscellaneous updates and fixes to libnvdimm including support for
    issuing "address range scrub" commands, clarifying the optimal
    'sector size' of pmem devices, a clarification of the usage of the
    ACPI '_STA' (status) property for DIMM devices, and other minor
    fixes"

    * tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
    libnvdimm, pmem: direct map legacy pmem by default
    libnvdimm, pmem: 'struct page' for pmem
    libnvdimm, pfn: 'struct page' provider infrastructure
    x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
    add devm_memremap_pages
    mm: ZONE_DEVICE for "device memory"
    mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
    dax: drop size parameter to ->direct_access()
    nd_blk: change aperture mapping from WC to WB
    nvdimm: change to use generic kvfree()
    pmem, dax: have direct_access use __pmem annotation
    dax: update I/O path to do proper PMEM flushing
    pmem: add copy_from_iter_pmem() and clear_pmem()
    pmem, x86: clean up conditional pmem includes
    pmem: remove layer when calling arch_has_wmb_pmem()
    pmem, x86: move x86 PMEM API to new pmem.h header
    libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
    pmem: switch to devm_ allocations
    devres: add devm_memremap
    libnvdimm, btt: write and validate parent_uuid
    ...

    Linus Torvalds
     

28 Aug, 2015

1 commit


21 Aug, 2015

1 commit

  • Update the annotation for the kaddr pointer returned by direct_access()
    so that it is a __pmem pointer. This is consistent with the PMEM driver
    and with how this direct_access() pointer is used in the DAX code.

    Signed-off-by: Ross Zwisler
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Ross Zwisler
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

17 Jul, 2015

1 commit


17 Feb, 2015

1 commit

  • Since this is relating to FS_XIP, not KERNEL_XIP, it should be called
    DAX instead of XIP.

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

14 Jan, 2015

3 commits

  • Because of the direct_access() API which returns a PFN. partitions
    better start on 4K boundary, else offset ZERO of a partition will
    not be aligned and blk_direct_access() will fail the call.

    By setting blk_queue_physical_block_size(PAGE_SIZE) we can communicate
    this to fdisk and friends.

    The call to blk_queue_physical_block_size() is harmless and will
    not affect the Kernel behavior in any way. It is only for
    communication to user-mode.

    before this patch running fdisk on a default size brd of 4M
    the first sector offered is 34 (BAD), but after this patch it
    will be 40, ie 8 sectors aligned. Also when entering some random
    partition sizes the next partition-start sector is offered 8 sectors
    aligned after this patch. (Please note that with fdisk the user
    can still enter bad values, only the offered default values will
    be correct)

    Note that with bdev-size > 4M fdisk will try to align on a 1M
    boundary (above first-sector will be 2048), in any case.

    CC: Martin K. Petersen
    Signed-off-by: Boaz Harrosh
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Boaz Harrosh
     
  • This patch fixes up brd's partitions scheme, now enjoying all worlds.

    The MAIN fix here is that currently, if one fdisks some partitions,
    a BAD bug will make all partitions point to the same start-end sector
    ie: 0 - brd_size And an mkfs of any partition would trash the partition
    table and the other partition.

    Another fix is that "mount -U uuid" will not work if show_part was not
    specified, because of the GENHD_FL_SUPPRESS_PARTITION_INFO flag.
    We now always load without it and remove the show_part parameter.

    [We remove Dmitry's new module-param part_show it is now always
    show]

    So NOW the logic goes like this:
    * max_part - Just says how many minors to reserve between ramX
    devices. In any way, there can be as many partition as requested.
    If minors between devices ends, then dynamic 259-major ids will
    be allocated on the fly.
    The default is now max_part=1, which means all partitions devt(s)
    will be from the dynamic (259) major-range.
    (If persistent partition minors is needed use max_part=X)
    For example with /dev/sdX max_part is hard coded 16.

    * Creation of new devices on the fly still/always work:
    mknod /path/devnod b 1 X
    fdisk -l /path/devnod
    Will create a new device if [X / max_part] was not already
    created before. (Just as before)

    partitions on the dynamically created device will work as well
    Same logic applies with minors as with the pre-created ones.

    TODO: dynamic grow of device size. So each device can have it's
    own size.

    CC: Dmitry Monakhov
    Tested-by: Ross Zwisler
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Boaz Harrosh
     
  • In order to support accesses to larger chunks of memory, pass in a
    'size' parameter (counted in bytes), and return the amount available at
    that address.

    Add a new helper function, bdev_direct_access(), to handle common
    functionality including partition handling, checking the length requested
    is positive, checking for the sector being page-aligned, and checking
    the length of the request does not pass the end of the partition.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Jan Kara
    Reviewed-by: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Matthew Wilcox
     

22 Aug, 2014

1 commit

  • Currenly ram disk is not visiable inside /proc/partitions. This was
    done for compatibility reasons here: 53978d0a7a27. But some utilities
    expect disk presents in /proc/partitions.
    Let's add module's option and let's administrator chose visibility behaviour.
    By default, old behaviour preserved.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

05 Jun, 2014

2 commits


24 Nov, 2013

2 commits

  • More prep work for immutable biovecs - with immutable bvecs drivers
    won't be able to use the biovec directly, they'll need to use helpers
    that take into account bio->bi_iter.bi_bvec_done.

    This updates callers for the new usage without changing the
    implementation yet.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Paul Clements
    Cc: Jim Paris
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Nagalakshmi Nandigama
    Cc: Sreekanth Reddy
    Cc: support@lsi.com
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: Alexander Viro
    Cc: Steven Whitehouse
    Cc: Herton Ronaldo Krzesinski
    Cc: Tejun Heo
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Matthew Wilcox
    Cc: Keith Busch
    Cc: Stephen Hemminger
    Cc: Quoc-Son Anh
    Cc: Sebastian Ott
    Cc: Nitin Gupta
    Cc: Minchan Kim
    Cc: Jerome Marchand
    Cc: Seth Jennings
    Cc: "Martin K. Petersen"
    Cc: Mike Snitzer
    Cc: Vivek Goyal
    Cc: "Darrick J. Wong"
    Cc: Chris Metcalf
    Cc: Jan Kara
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: drbd-user@lists.linbit.com
    Cc: nbd-general@lists.sourceforge.net
    Cc: cbe-oss-dev@lists.ozlabs.org
    Cc: xen-devel@lists.xensource.com
    Cc: virtualization@lists.linux-foundation.org
    Cc: linux-raid@vger.kernel.org
    Cc: linux-s390@vger.kernel.org
    Cc: DL-MPTFusionLinux@lsi.com
    Cc: linux-scsi@vger.kernel.org
    Cc: devel@driverdev.osuosl.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: cluster-devel@redhat.com
    Cc: linux-mm@kvack.org
    Acked-by: Geoff Levand

    Kent Overstreet
     
  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

08 Nov, 2013

1 commit

  • The probe function is supposed to return NULL on failure (as we can see in
    kobj_lookup: kobj = probe(dev, index, data); ... if (kobj) return kobj;

    However, in loop and brd, it returns negative error from ERR_PTR.

    This causes a crash if we simulate disk allocation failure and run
    less -f /dev/loop0 because the negative number is interpreted as a pointer:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000002b4
    IP: [] __blkdev_get+0x28/0x450
    PGD 23c677067 PUD 23d6d1067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: loop hpfs nvidia(PO) ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_stats cpufreq_ondemand cpufreq_userspace cpufreq_powersave cpufreq_conservative hid_generic spadfs usbhid hid fuse raid0 snd_usb_audio snd_pcm_oss snd_mixer_oss md_mod snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib dmi_sysfs snd_rawmidi nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd soundcore lm85 hwmon_vid ohci_hcd ehci_pci ehci_hcd serverworks sata_svw libata acpi_cpufreq freq_table mperf ide_core usbcore kvm_amd kvm tg3 i2c_piix4 libphy microcode e100 usb_common ptp skge i2c_core pcspkr k10temp evdev floppy hwmon pps_core mii rtc_cmos button processor unix [last unloaded: nvidia]
    CPU: 1 PID: 6831 Comm: less Tainted: P W O 3.10.15-devel #18
    Hardware name: empty empty/S3992-E, BIOS 'V1.06 ' 06/09/2009
    task: ffff880203cc6bc0 ti: ffff88023e47c000 task.ti: ffff88023e47c000
    RIP: 0010:[] [] __blkdev_get+0x28/0x450
    RSP: 0018:ffff88023e47dbd8 EFLAGS: 00010286
    RAX: ffffffffffffff74 RBX: ffffffffffffff74 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
    RBP: ffff88023e47dc18 R08: 0000000000000002 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023f519658
    R13: ffffffff8118c300 R14: 0000000000000000 R15: ffff88023f519640
    FS: 00007f2070bf7700(0000) GS:ffff880247400000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000002b4 CR3: 000000023da1d000 CR4: 00000000000007e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    0000000000000002 0000001d00000000 000000003e47dc50 ffff88023f519640
    ffff88043d5bb668 ffffffff8118c300 ffff88023d683550 ffff88023e47de60
    ffff88023e47dc98 ffffffff8118c10d 0000001d81605698 0000000000000292
    Call Trace:
    [] ? blkdev_get_by_dev+0x60/0x60
    [] blkdev_get+0x1dd/0x370
    [] ? blkdev_get_by_dev+0x60/0x60
    [] ? _raw_spin_unlock+0x2c/0x50
    [] ? blkdev_get_by_dev+0x60/0x60
    [] blkdev_open+0x65/0x80
    [] do_dentry_open.isra.18+0x23e/0x2f0
    [] finish_open+0x34/0x50
    [] do_last.isra.62+0x2d2/0xc50
    [] path_openat.isra.63+0xb8/0x4d0
    [] ? might_fault+0x4e/0xa0
    [] do_filp_open+0x40/0x90
    [] ? _raw_spin_unlock+0x2c/0x50
    [] ? __alloc_fd+0xa5/0x1f0
    [] do_sys_open+0xef/0x1d0
    [] SyS_open+0x19/0x20
    [] system_call_fastpath+0x1a/0x1f
    Code: 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 89 d6 41 55 41 54 4c 8d 67 18 53 48 83 ec 18 89 75 cc e9 f2 00 00 00 0f 1f 44 00 00 8b 80 40 03 00 00 48 89 df 4c 8b 68 58 e8 d5
    a4 07 00 44 89
    RIP [] __blkdev_get+0x28/0x450
    RSP
    CR2: 00000000000002b4
    ---[ end trace bb7f32dbf02398dc ]---

    The brd change should be backported to stable kernels starting with 2.6.25.
    The loop change should be backported to stable kernels starting with 2.6.22.

    Signed-off-by: Mikulas Patocka
    Acked-by: Tejun Heo
    Cc: stable@kernel.org # 2.6.22+
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     

25 May, 2013

1 commit

  • The index on the page must be set before it is inserted in the radix
    tree. Otherwise there is a small race which can occur during lookup
    where the page can be found with the incorrect index. This will trigger
    the BUG_ON() in brd_lookup_page().

    Signed-off-by: Brian Behlendorf
    Reported-by: Chris Wedgwood
    Cc: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brian Behlendorf
     

24 Mar, 2013

1 commit

  • Just a little convenience macro - main reason to add it now is preparing
    for immutable bio vecs, it'll reduce the size of the patch that puts
    bi_sector/bi_size/bi_idx into a struct bvec_iter.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: Lars Ellenberg
    CC: Jiri Kosina
    CC: Alasdair Kergon
    CC: dm-devel@redhat.com
    CC: Neil Brown
    CC: Martin Schwidefsky
    CC: Heiko Carstens
    CC: linux-s390@vger.kernel.org
    CC: Chris Mason
    CC: Steven Whitehouse
    Acked-by: Steven Whitehouse

    Kent Overstreet
     

20 Mar, 2012

1 commit


04 Jan, 2012

1 commit

  • Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
    kill_bdev as well, so brd doesn't have to open code it. Reduce
    buffer_head.h requirement accordingly.

    Removed a rather large comment from invalidate_bdev, as it looked a bit
    obsolete to bother moving. The small comment replacing it says enough.

    Signed-off-by: Nick Piggin
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Al Viro
     

12 Sep, 2011

1 commit

  • There is very little benefit in allowing to let a ->make_request
    instance update the bios device and sector and loop around it in
    __generic_make_request when we can archive the same through calling
    generic_make_request from the driver and letting the loop in
    generic_make_request handle it.

    Note that various drivers got the return value from ->make_request and
    returned non-zero values for errors.

    Signed-off-by: Christoph Hellwig
    Acked-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

27 May, 2011

2 commits

  • Export 'rd_nr', 'rd_size' and 'max_part' parameters to sysfs so user can
    know that how many devices are allowed, how big each device is and how
    many partitions are supported. If 'max_part' is 0, it means simply the
    device doesn't support partitioning.

    Also note that 'max_part' can be adjusted to power of 2 minus 1 form if
    needed. User should check this value after the module loading if he/she
    want to use that number correctly (i.e. fdisk, mknod, etc.).

    Signed-off-by: Namhyung Kim
    Cc: Laurent Vivier
    Signed-off-by: Jens Axboe

    Namhyung Kim
     
  • If 'rd_nr' param was not specified, 16 (can be adjusted via
    CONFIG_BLK_DEV_RAM_COUNT) devices would be created by default
    but comment said 1. Fix it.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Jens Axboe

    Namhyung Kim