19 Sep, 2016
1 commit
-
and building block/blk-mq-pci.o should depend on CONFIG_BLOCK
Fixes: 973c4e372c8f ("blk-mq: provide a default queue mapping for PCI device")
Signed-off-by: Stephen Rothwell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
15 Sep, 2016
16 commits
-
Fixes 1b157939f92a ("blk-mq: get rid of the cpumask in struct blk_mq_tags")
Signed-off-by: Jens Axboe -
Unused now that NVMe sets up irq affinity before calling into blk-mq.
Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
No need now that we don't have to reverse engineer the irq affinity.
Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
Use the new helper to automatically select the right interrupt type, as
well as to use the automatic interupt affinity assignment.Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
This allows drivers specify their own queue mapping by overriding the
setup-time function that builds the mq_map. This can be used for
example to build the map based on the MSI-X vector mapping provided
by the core interrupt layer for PCI devices.Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
All drivers use the default, so provide an inline version of it. If we
ever need other queue mapping we can add an optional method back,
although supporting will also require major changes to the queue setup
code.This provides better code generation, and better debugability as well.
Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
The mapping is identical for all queues in a tag_set, so stop wasting
memory for building multiple. Note that for now I've kept the mq_map
pointer in the request_queue, but we'll need to investigate if we can
remove it without suffering too much from the additional pointer chasing.
The same would apply to the mq_ops pointer as well.Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
Currently blk-mq will totally remap hardware context when a CPU hotplug
even happened, which causes major havoc for drivers, as they are never
told about this remapping. E.g. any carefully sorted out CPU affinity
will just be completely messed up.The rebuild also doesn't really help for the common case of cpu
hotplug, which is soft onlining / offlining of cpus - in this case we
should just leave the queue and irq mapping as is. If it actually
worked it would have helped in the case of physical cpu hotplug,
although for that we'd need a way to actually notify the driver.
Note that drivers may already be able to accommodate such a topology
change on their own, e.g. using the reset_controller sysfs file in NVMe
will cause the driver to get things right for this case.With the rebuild removed we will simplify retain the queue mapping for
a soft offlined CPU that will work when it comes back online, and will
map any newly onlined CPU to queue 0 until the driver initiates
a rebuild of the queue map.Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
…p/tip into for-4.9/msi-irq
-
Add a helper to get the affinity mask for a given PCI irq vector. For MSI or
MSI-X vectors these are stored by the IRQ core, while for legacy interrupts
we will always return cpu_possible_map.[hch: updated to follow the style of pci_irq_vector()]
Signed-off-by: Thomas Gleixner
Signed-off-by: Christoph Hellwig
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Link: http://lkml.kernel.org/r/1473862739-15032-6-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner -
No more users.
Signed-off-by: Thomas Gleixner
Cc: Christoph Hellwig
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Link: http://lkml.kernel.org/r/1473862739-15032-5-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner -
Switch MSI over to the new spreading code. If a pci device contains a valid
pointer to a cpumask, then this mask is used for spreading otherwise the
online cpu mask is used. This allows a driver to restrict the spread to a
subset of CPUs, e.g. cpus on a particular node.Signed-off-by: Thomas Gleixner
Cc: Christoph Hellwig
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Link: http://lkml.kernel.org/r/1473862739-15032-4-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner -
The current irq spreading infrastructure is just looking at a cpumask and
tries to spread the interrupts over the mask. Thats suboptimal as it does
not take numa nodes into account.Change the logic so the interrupts are spread across numa nodes and inside
the nodes. If there are more cpus than vectors per node, then we set the
affinity to several cpus. If HT siblings are available we take that into
account and try to set all siblings to a single vector.Signed-off-by: Thomas Gleixner
Cc: Christoph Hellwig
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Link: http://lkml.kernel.org/r/1473862739-15032-3-git-send-email-hch@lst.de -
For irq spreading want to store affinity masks in the msi_entry. Add the
infrastructure for it.We allocate an array of cpumasks with an array size of the number of used
vectors in the entry, so we can hand in the information per linux interrupt
later.As we hand in the number of used vectors, we assign them right
away. Convert all the call sites.Signed-off-by: Thomas Gleixner
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Cc: Christoph Hellwig
Link: http://lkml.kernel.org/r/1473862739-15032-2-git-send-email-hch@lst.de -
blk_mq_delay_kick_requeue_list() provides the ability to kick the
q->requeue_list after a specified time. To do this the request_queue's
'requeue_work' member was changed to a delayed_work.blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
requests while it doesn't make sense to immediately requeue them
(e.g. when all paths in a DM multipath have failed).Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe
14 Sep, 2016
11 commits
-
Signed-off-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe -
Since REQ_OP_BITS == 3 and __REQ_NR_BITS == 30 it is not that hard
to pass an op_flags argument to bio_set_op_attrs() that is larger
than the number of bits reserved for the op_flags argument. Complain
if this happens. Additionally, ensure that negative arguments trigger
a complaint (1 << ... is signed while 1U << ... is unsigned; adding
0U to an integer expression causes it to be promoted to an unsigned
type).Signed-off-by: Bart Van Assche
Cc: Mike Christie
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Damien Le Moal
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Introduce the bio_flags() macro. Ensure that the second argument of
bio_set_op_attrs() only contains flags and no operation. This patch
does not change any functionality.Signed-off-by: Bart Van Assche
Cc: Mike Christie
Cc: Chris Mason (maintainer:BTRFS FILE SYSTEM)
Cc: Josef Bacik (maintainer:BTRFS FILE SYSTEM)
Cc: Mike Snitzer
Cc: Hannes Reinecke
Cc: Damien Le Moal
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Make it clear that the sizeof(unsigned int) expression in BIO_OP_SHIFT
refers to the bi_opf member of struct bio.Signed-off-by: Bart Van Assche
Cc: Mike Christie
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Damien Le Moal
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
commit e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1
"block: Do away with the notion of hardsect_size"
removed the notion of "hardware sector size" from
the kernel in favor of logical block size, but
references remain in comments and documentation.Update the remaining sites mentioning hardsect.
Signed-off-by: Linus Walleij
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
The blk_mq_alloc_single_hw_queue() is a prototype artifact that
should have been removed with
commit cdef54dd85ad66e77262ea57796a3e81683dd5d6
"blk-mq: remove alloc_hctx and free_hctx methods" where the last
users of it were deleted.Fixes: cdef54dd85ad ("blk-mq: remove alloc_hctx and free_hctx methods")
Signed-off-by: Linus Walleij
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
DAX support for block devices was removed in commits 03cdad
("block: disable block device DAX by default") and 99a01cd
("block: remove BLK_DEV_DAX config option"), but we still kept a call to
dax_do_io and some uneeded i_flags manipulations introduced in commit
bbab37 ("block: Add support for DAX reads/writes to block devices").Remove those leftovers.
Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Acked-by: Dan Williams
Signed-off-by: Jens Axboe -
Allow the io_poll statistics to be zeroed to make for easier logging
of polling event.Signed-off-by: Stephen Bates
Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
In order to help determine the effectiveness of polling in a running
system it is usful to determine the ratio of how often the poll
function is called vs how often the completion is checked. For this
reason we add a poll_considered variable and add it to the sysfs entry
for io_poll.Signed-off-by: Stephen Bates
Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe
12 Sep, 2016
4 commits
-
Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci
driver") removed the dependency on BLK_DEV_NVME, but the cdoe does
depend on the block layer (which used to be an implicit dependency
through BLK_DEV_NVME).Otherwise you get various errors from the kbuild test robot random
config testing when that happens to hit a configuration with BLOCK
device support disabled.Cc: Christoph Hellwig
Cc: Jay Freyensee
Cc: Sagi Grimberg
Signed-off-by: Linus Torvalds -
Pull IIO fixes from Greg KH:
"Here are a few small IIO fixes for 4.8-rc6.Nothing major, full details are in the shortlog, all of these have
been in linux-next with no reported issues"* tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio:core: fix IIO_VAL_FRACTIONAL sign handling
iio: ensure ret is initialized to zero before entering do loop
iio: accel: kxsd9: Fix scaling bug
iio: accel: bmc150: reset chip at init time
iio: fix pressure data output unit in hid-sensor-attributes
tools:iio:iio_generic_buffer: fix trigger-less mode -
Pull USB fixes from Greg KH:
"Here are some small USB gadget, phy, and xhci fixes for 4.8-rc6.All of these resolve minor issues that have been reported, and all
have been in linux-next with no reported issues"* tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
xhci: fix null pointer dereference in stop command timeout function
usb: dwc3: pci: fix build warning on !PM_SLEEP
usb: gadget: prevent potenial null pointer dereference on skb->len
usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
usb: phy: phy-generic: Check clk_prepare_enable() error
usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON
Revert "usb: dwc3: gadget: always decrement by 1"
11 Sep, 2016
3 commits
-
Pull libnvdimm fixes from Dan Williams:
"nvdimm fixes for v4.8, two of them are tagged for -stable:- Fix devm_memremap_pages() to use track_pfn_insert(). Otherwise,
DAX pmd mappings end up with an uncached pgprot, and unusable
performance for the device-dax interface. The device-dax interface
appeared in 4.7 so this is tagged for -stable.- Fix a couple VM_BUG_ON() checks in the show_smaps() path to
understand DAX pmd entries. This fix is tagged for -stable.- Fix a mis-merge of the nfit machine-check handler to flip the
polarity of an if() to match the final version of the patch that
Vishal sent for 4.8-rc1. Without this the nfit machine check
handler never detects / inserts new 'badblocks' entries which
applications use to identify lost portions of files.- For test purposes, fix the nvdimm_clear_poison() path to operate on
legacy / simulated nvdimm memory ranges. Without this fix a test
can set badblocks, but never clear them on these ranges.- Fix the range checking done by dax_dev_pmd_fault(). This is not
tagged for -stable since this problem is mitigated by specifying
aligned resources at device-dax setup time.These patches have appeared in a next release over the past week. The
recent rebase you can see in the timestamps was to drop an invalid fix
as identified by the updated device-dax unit tests [1]. The -mm
touches have an ack from Andrew"[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm: allow legacy (e820) pmem region to clear bad blocks
nfit, mce: Fix SPA matching logic in MCE handler
mm: fix cache mode of dax pmd mappings
mm: fix show_smap() for zone_device-pmd ranges
dax: fix mapping size check -
Pull i2c fixes from Wolfram Sang:
"Mostly driver bugfixes, but also a few cleanups which are nice to have
out of the way"* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: rk3x: Restore clock settings at resume time
i2c: Spelling s/acknowedge/acknowledge/
i2c: designware: save the preset value of DW_IC_SDA_HOLD
Documentation: i2c: slave-interface: add note for driver development
i2c: mux: demux-pinctrl: run properly with multiple instances
i2c: bcm-kona: fix inconsistent indenting
i2c: rcar: use proper device with dma_mapping_error
i2c: sh_mobile: use proper device with dma_mapping_error
i2c: mux: demux-pinctrl: invalidate properly when switching fails -
Pull fscrypto fixes fromTed Ts'o:
"Fix some brown-paper-bag bugs for fscrypto, including one one which
allows a malicious user to set an encryption policy on an empty
directory which they do not own"* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
fscrypto: require write access to mount to set encryption policy
fscrypto: only allow setting encryption policy on directories
fscrypto: add authorization check for setting encryption policy
10 Sep, 2016
5 commits
-
Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem. This was handled correctly by f2fs but not by ext4. Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.Signed-off-by: Eric Biggers
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o
Acked-by: Jaegeuk Kim -
The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
policy on nondirectory files. This was unintentional, and in the case
of nonempty regular files did not behave as expected because existing
data was not actually encrypted by the ioctl.In the case of ext4, the user could also trigger filesystem errors in
->empty_dir(), e.g. due to mismatched "directory" checksums when the
kernel incorrectly tried to interpret a regular file as a directory.This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
kernels v4.6 and later. It appears that older kernels only permitted
directories and that the check was accidentally lost during the
refactoring to share the file encryption code between ext4 and f2fs.This patch restores the !S_ISDIR() check that was present in older
kernels.Signed-off-by: Eric Biggers
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o -
On an ext4 or f2fs filesystem with file encryption supported, a user
could set an encryption policy on any empty directory(*) to which they
had readonly access. This is obviously problematic, since such a
directory might be owned by another user and the new encryption policy
would prevent that other user from creating files in their own directory
(for example).Fix this by requiring inode_owner_or_capable() permission to set an
encryption policy. This means that either the caller must own the file,
or the caller must have the capability CAP_FOWNER.(*) Or also on any regular file, for f2fs v4.6 and later and ext4
v4.8-rc1 and later; a separate bug fix is coming for that.Signed-off-by: Eric Biggers
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o -
Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation
where legacy pmem is being used or a pmem region created by using memmap
kernel parameter, the injected bad blocks are not cleared due to
nvdimm_clear_poison() failing from lack of ndctl function pointer. In
this case we need to just return as handled and allow the bad blocks to
be cleared rather than fail.Reviewed-by: Vishal Verma
Signed-off-by: Dave Jiang
Signed-off-by: Dan Williams -
The check for a 'pmem' type SPA in the MCE handler was inverted due to a
merge/rebase error.Fixes: 6839a6d nfit: do an ARS scrub on hitting a latent media error
Cc: linux-acpi@vger.kernel.org
Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams