19 Sep, 2016

1 commit


15 Sep, 2016

16 commits

  • Fixes 1b157939f92a ("blk-mq: get rid of the cpumask in struct blk_mq_tags")
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Unused now that NVMe sets up irq affinity before calling into blk-mq.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • No need now that we don't have to reverse engineer the irq affinity.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Use the new helper to automatically select the right interrupt type, as
    well as to use the automatic interupt affinity assignment.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This allows drivers specify their own queue mapping by overriding the
    setup-time function that builds the mq_map. This can be used for
    example to build the map based on the MSI-X vector mapping provided
    by the core interrupt layer for PCI devices.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • All drivers use the default, so provide an inline version of it. If we
    ever need other queue mapping we can add an optional method back,
    although supporting will also require major changes to the queue setup
    code.

    This provides better code generation, and better debugability as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • The mapping is identical for all queues in a tag_set, so stop wasting
    memory for building multiple. Note that for now I've kept the mq_map
    pointer in the request_queue, but we'll need to investigate if we can
    remove it without suffering too much from the additional pointer chasing.
    The same would apply to the mq_ops pointer as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Currently blk-mq will totally remap hardware context when a CPU hotplug
    even happened, which causes major havoc for drivers, as they are never
    told about this remapping. E.g. any carefully sorted out CPU affinity
    will just be completely messed up.

    The rebuild also doesn't really help for the common case of cpu
    hotplug, which is soft onlining / offlining of cpus - in this case we
    should just leave the queue and irq mapping as is. If it actually
    worked it would have helped in the case of physical cpu hotplug,
    although for that we'd need a way to actually notify the driver.
    Note that drivers may already be able to accommodate such a topology
    change on their own, e.g. using the reset_controller sysfs file in NVMe
    will cause the driver to get things right for this case.

    With the rebuild removed we will simplify retain the queue mapping for
    a soft offlined CPU that will work when it comes back online, and will
    map any newly onlined CPU to queue 0 until the driver initiates
    a rebuild of the queue map.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • …p/tip into for-4.9/msi-irq

    Jens Axboe
     
  • Add a helper to get the affinity mask for a given PCI irq vector. For MSI or
    MSI-X vectors these are stored by the IRQ core, while for legacy interrupts
    we will always return cpu_possible_map.

    [hch: updated to follow the style of pci_irq_vector()]

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Christoph Hellwig
    Cc: axboe@fb.com
    Cc: keith.busch@intel.com
    Cc: agordeev@redhat.com
    Cc: linux-block@vger.kernel.org
    Link: http://lkml.kernel.org/r/1473862739-15032-6-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • No more users.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: axboe@fb.com
    Cc: keith.busch@intel.com
    Cc: agordeev@redhat.com
    Cc: linux-block@vger.kernel.org
    Link: http://lkml.kernel.org/r/1473862739-15032-5-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Switch MSI over to the new spreading code. If a pci device contains a valid
    pointer to a cpumask, then this mask is used for spreading otherwise the
    online cpu mask is used. This allows a driver to restrict the spread to a
    subset of CPUs, e.g. cpus on a particular node.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: axboe@fb.com
    Cc: keith.busch@intel.com
    Cc: agordeev@redhat.com
    Cc: linux-block@vger.kernel.org
    Link: http://lkml.kernel.org/r/1473862739-15032-4-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The current irq spreading infrastructure is just looking at a cpumask and
    tries to spread the interrupts over the mask. Thats suboptimal as it does
    not take numa nodes into account.

    Change the logic so the interrupts are spread across numa nodes and inside
    the nodes. If there are more cpus than vectors per node, then we set the
    affinity to several cpus. If HT siblings are available we take that into
    account and try to set all siblings to a single vector.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: axboe@fb.com
    Cc: keith.busch@intel.com
    Cc: agordeev@redhat.com
    Cc: linux-block@vger.kernel.org
    Link: http://lkml.kernel.org/r/1473862739-15032-3-git-send-email-hch@lst.de

    Thomas Gleixner
     
  • For irq spreading want to store affinity masks in the msi_entry. Add the
    infrastructure for it.

    We allocate an array of cpumasks with an array size of the number of used
    vectors in the entry, so we can hand in the information per linux interrupt
    later.

    As we hand in the number of used vectors, we assign them right
    away. Convert all the call sites.

    Signed-off-by: Thomas Gleixner
    Cc: axboe@fb.com
    Cc: keith.busch@intel.com
    Cc: agordeev@redhat.com
    Cc: linux-block@vger.kernel.org
    Cc: Christoph Hellwig
    Link: http://lkml.kernel.org/r/1473862739-15032-2-git-send-email-hch@lst.de

    Thomas Gleixner
     
  • blk_mq_delay_kick_requeue_list() provides the ability to kick the
    q->requeue_list after a specified time. To do this the request_queue's
    'requeue_work' member was changed to a delayed_work.

    blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
    requests while it doesn't make sense to immediately requeue them
    (e.g. when all paths in a DM multipath have failed).

    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

14 Sep, 2016

11 commits

  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Since REQ_OP_BITS == 3 and __REQ_NR_BITS == 30 it is not that hard
    to pass an op_flags argument to bio_set_op_attrs() that is larger
    than the number of bits reserved for the op_flags argument. Complain
    if this happens. Additionally, ensure that negative arguments trigger
    a complaint (1 << ... is signed while 1U << ... is unsigned; adding
    0U to an integer expression causes it to be promoted to an unsigned
    type).

    Signed-off-by: Bart Van Assche
    Cc: Mike Christie
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Damien Le Moal
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Introduce the bio_flags() macro. Ensure that the second argument of
    bio_set_op_attrs() only contains flags and no operation. This patch
    does not change any functionality.

    Signed-off-by: Bart Van Assche
    Cc: Mike Christie
    Cc: Chris Mason (maintainer:BTRFS FILE SYSTEM)
    Cc: Josef Bacik (maintainer:BTRFS FILE SYSTEM)
    Cc: Mike Snitzer
    Cc: Hannes Reinecke
    Cc: Damien Le Moal
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Make it clear that the sizeof(unsigned int) expression in BIO_OP_SHIFT
    refers to the bi_opf member of struct bio.

    Signed-off-by: Bart Van Assche
    Cc: Mike Christie
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Damien Le Moal
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • commit e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1
    "block: Do away with the notion of hardsect_size"
    removed the notion of "hardware sector size" from
    the kernel in favor of logical block size, but
    references remain in comments and documentation.

    Update the remaining sites mentioning hardsect.

    Signed-off-by: Linus Walleij
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Linus Walleij
     
  • The blk_mq_alloc_single_hw_queue() is a prototype artifact that
    should have been removed with
    commit cdef54dd85ad66e77262ea57796a3e81683dd5d6
    "blk-mq: remove alloc_hctx and free_hctx methods" where the last
    users of it were deleted.

    Fixes: cdef54dd85ad ("blk-mq: remove alloc_hctx and free_hctx methods")
    Signed-off-by: Linus Walleij
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Linus Walleij
     
  • DAX support for block devices was removed in commits 03cdad
    ("block: disable block device DAX by default") and 99a01cd
    ("block: remove BLK_DEV_DAX config option"), but we still kept a call to
    dax_do_io and some uneeded i_flags manipulations introduced in commit
    bbab37 ("block: Add support for DAX reads/writes to block devices").

    Remove those leftovers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Acked-by: Dan Williams
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Allow the io_poll statistics to be zeroed to make for easier logging
    of polling event.

    Signed-off-by: Stephen Bates
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Stephen Bates
     
  • In order to help determine the effectiveness of polling in a running
    system it is usful to determine the ratio of how often the poll
    function is called vs how often the completion is checked. For this
    reason we add a poll_considered variable and add it to the sysfs entry
    for io_poll.

    Signed-off-by: Stephen Bates
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Stephen Bates
     

12 Sep, 2016

4 commits

  • Linus Torvalds
     
  • Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci
    driver") removed the dependency on BLK_DEV_NVME, but the cdoe does
    depend on the block layer (which used to be an implicit dependency
    through BLK_DEV_NVME).

    Otherwise you get various errors from the kbuild test robot random
    config testing when that happens to hit a configuration with BLOCK
    device support disabled.

    Cc: Christoph Hellwig
    Cc: Jay Freyensee
    Cc: Sagi Grimberg
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull IIO fixes from Greg KH:
    "Here are a few small IIO fixes for 4.8-rc6.

    Nothing major, full details are in the shortlog, all of these have
    been in linux-next with no reported issues"

    * tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    iio:core: fix IIO_VAL_FRACTIONAL sign handling
    iio: ensure ret is initialized to zero before entering do loop
    iio: accel: kxsd9: Fix scaling bug
    iio: accel: bmc150: reset chip at init time
    iio: fix pressure data output unit in hid-sensor-attributes
    tools:iio:iio_generic_buffer: fix trigger-less mode

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are some small USB gadget, phy, and xhci fixes for 4.8-rc6.

    All of these resolve minor issues that have been reported, and all
    have been in linux-next with no reported issues"

    * tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
    xhci: fix null pointer dereference in stop command timeout function
    usb: dwc3: pci: fix build warning on !PM_SLEEP
    usb: gadget: prevent potenial null pointer dereference on skb->len
    usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
    usb: phy: phy-generic: Check clk_prepare_enable() error
    usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON
    Revert "usb: dwc3: gadget: always decrement by 1"

    Linus Torvalds
     

11 Sep, 2016

3 commits

  • Pull libnvdimm fixes from Dan Williams:
    "nvdimm fixes for v4.8, two of them are tagged for -stable:

    - Fix devm_memremap_pages() to use track_pfn_insert(). Otherwise,
    DAX pmd mappings end up with an uncached pgprot, and unusable
    performance for the device-dax interface. The device-dax interface
    appeared in 4.7 so this is tagged for -stable.

    - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
    understand DAX pmd entries. This fix is tagged for -stable.

    - Fix a mis-merge of the nfit machine-check handler to flip the
    polarity of an if() to match the final version of the patch that
    Vishal sent for 4.8-rc1. Without this the nfit machine check
    handler never detects / inserts new 'badblocks' entries which
    applications use to identify lost portions of files.

    - For test purposes, fix the nvdimm_clear_poison() path to operate on
    legacy / simulated nvdimm memory ranges. Without this fix a test
    can set badblocks, but never clear them on these ranges.

    - Fix the range checking done by dax_dev_pmd_fault(). This is not
    tagged for -stable since this problem is mitigated by specifying
    aligned resources at device-dax setup time.

    These patches have appeared in a next release over the past week. The
    recent rebase you can see in the timestamps was to drop an invalid fix
    as identified by the updated device-dax unit tests [1]. The -mm
    touches have an ack from Andrew"

    [1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
    https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    libnvdimm: allow legacy (e820) pmem region to clear bad blocks
    nfit, mce: Fix SPA matching logic in MCE handler
    mm: fix cache mode of dax pmd mappings
    mm: fix show_smap() for zone_device-pmd ranges
    dax: fix mapping size check

    Linus Torvalds
     
  • Pull i2c fixes from Wolfram Sang:
    "Mostly driver bugfixes, but also a few cleanups which are nice to have
    out of the way"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: rk3x: Restore clock settings at resume time
    i2c: Spelling s/acknowedge/acknowledge/
    i2c: designware: save the preset value of DW_IC_SDA_HOLD
    Documentation: i2c: slave-interface: add note for driver development
    i2c: mux: demux-pinctrl: run properly with multiple instances
    i2c: bcm-kona: fix inconsistent indenting
    i2c: rcar: use proper device with dma_mapping_error
    i2c: sh_mobile: use proper device with dma_mapping_error
    i2c: mux: demux-pinctrl: invalidate properly when switching fails

    Linus Torvalds
     
  • Pull fscrypto fixes fromTed Ts'o:
    "Fix some brown-paper-bag bugs for fscrypto, including one one which
    allows a malicious user to set an encryption policy on an empty
    directory which they do not own"

    * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    fscrypto: require write access to mount to set encryption policy
    fscrypto: only allow setting encryption policy on directories
    fscrypto: add authorization check for setting encryption policy

    Linus Torvalds
     

10 Sep, 2016

5 commits

  • Since setting an encryption policy requires writing metadata to the
    filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
    Otherwise, a user could cause a write to a frozen or readonly
    filesystem. This was handled correctly by f2fs but not by ext4. Make
    fscrypt_process_policy() handle it rather than relying on the filesystem
    to get it right.

    Signed-off-by: Eric Biggers
    Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
    Signed-off-by: Theodore Ts'o
    Acked-by: Jaegeuk Kim

    Eric Biggers
     
  • The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
    policy on nondirectory files. This was unintentional, and in the case
    of nonempty regular files did not behave as expected because existing
    data was not actually encrypted by the ioctl.

    In the case of ext4, the user could also trigger filesystem errors in
    ->empty_dir(), e.g. due to mismatched "directory" checksums when the
    kernel incorrectly tried to interpret a regular file as a directory.

    This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
    kernels v4.6 and later. It appears that older kernels only permitted
    directories and that the check was accidentally lost during the
    refactoring to share the file encryption code between ext4 and f2fs.

    This patch restores the !S_ISDIR() check that was present in older
    kernels.

    Signed-off-by: Eric Biggers
    Cc: stable@vger.kernel.org
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     
  • On an ext4 or f2fs filesystem with file encryption supported, a user
    could set an encryption policy on any empty directory(*) to which they
    had readonly access. This is obviously problematic, since such a
    directory might be owned by another user and the new encryption policy
    would prevent that other user from creating files in their own directory
    (for example).

    Fix this by requiring inode_owner_or_capable() permission to set an
    encryption policy. This means that either the caller must own the file,
    or the caller must have the capability CAP_FOWNER.

    (*) Or also on any regular file, for f2fs v4.6 and later and ext4
    v4.8-rc1 and later; a separate bug fix is coming for that.

    Signed-off-by: Eric Biggers
    Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     
  • Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation
    where legacy pmem is being used or a pmem region created by using memmap
    kernel parameter, the injected bad blocks are not cleared due to
    nvdimm_clear_poison() failing from lack of ndctl function pointer. In
    this case we need to just return as handled and allow the bad blocks to
    be cleared rather than fail.

    Reviewed-by: Vishal Verma
    Signed-off-by: Dave Jiang
    Signed-off-by: Dan Williams

    Dave Jiang
     
  • The check for a 'pmem' type SPA in the MCE handler was inverted due to a
    merge/rebase error.

    Fixes: 6839a6d nfit: do an ARS scrub on hitting a latent media error
    Cc: linux-acpi@vger.kernel.org
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma