Eric Lee / smarc-fsl-linux-kernel

28 May, 2016

1 commit

564884fbd Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A set of fixes that wasn't included in the first merge window pull
request. This pull request contains:

- A set of NVMe fixes from Keith, and one from Nic for the integrity
side of it.

- Fix from Ming, clearing ->mq_ops if we don't successfully setup a
queue for multiqueue.

- A set of stability fixes for bcache from Jiri, and also marking
bcache as orphaned as it's no longer actively maintained (in
mainline, at least)"

* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: clear q->mq_ops if init fail
MAINTAINERS: mark bcache as orphan
bcache: bch_gc_thread() is not freezable
bcache: bch_allocator_thread() is not freezable
bcache: bch_writeback_thread() is not freezable
nvme/host: Add missing blk_integrity tag_size + flags assignments
NVMe: Add device ID's with stripe quirk
NVMe: Short-cut removal on surprise hot-unplug
NVMe: Allow user initiated rescan
NVMe: Reduce driver log spamming
NVMe: Unbind driver on failure
NVMe: Delete only created queues
NVMe: Allocate queues only for online cpus

Linus Torvalds
2016-05-28 05:28:09 +0800

27 May, 2016

1 commit

315227f6d Merge tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull misc DAX updates from Vishal Verma:
"DAX error handling for 4.7

- Until now, dax has been disabled if media errors were found on any
device. This enables the use of DAX in the presence of these
errors by making all sector-aligned zeroing go through the driver.

- The driver (already) has the ability to clear errors on writes that
are sent through the block layer using 'DSMs' defined in ACPI 6.1.

Other misc changes:

- When mounting DAX filesystems, check to make sure the partition is
page aligned. This is a requirement for DAX, and previously, we
allowed such unaligned mounts to succeed, but subsequent
reads/writes would fail.

- Misc/cleanup fixes from Jan that remove unused code from DAX
related to zeroing, writeback, and some size checks"

* tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: fix a comment in dax_zero_page_range and dax_truncate_page
dax: for truncate/hole-punch, do zeroing through the driver if possible
dax: export a low-level __dax_zero_page_range helper
dax: use sb_issue_zerout instead of calling dax_clear_sectors
dax: enable dax in the presence of known media errors (badblocks)
dax: fallback from pmd to pte on error
block: Update blkdev_dax_capable() for consistency
xfs: Add alignment check for DAX mount
ext2: Add alignment check for DAX mount
ext4: Add alignment check for DAX mount
block: Add bdev_dax_supported() for dax mount checks
block: Add vfs_msg() interface
dax: Remove redundant inode size checks
dax: Remove pointless writeback from dax_do_io()
dax: Remove zeroing from dax_io()
dax: Remove dead zeroing code from fault handlers
ext2: Avoid DAX zeroing to corrupt data
ext2: Fix block zeroing in ext2_get_blocks() for DAX
dax: Remove complete_unwritten argument
DAX: move RADIX_DAX_ definitions to dax.c

Linus Torvalds
2016-05-27 10:34:26 +0800

26 May, 2016

1 commit

c7de57263 blk-mq: clear q->mq_ops if init fail ... Browse Code »

blk_mq_init_queue() calls blk_mq_init_allocated_queue(), but q->mq_ops
was not cleared when blk_mq_init_allocated_queue() fails.
Then blk_cleanup_queue() calls blk_mq_free_queue() which will crash because:
- q->all_q_node is not added to all_q_list yet
- q->tag_set is NULL
- hctx was not setup yet or already freed

Fixed it by clearing q->mq_ops on error path.

Signed-off-by: Ming Lin
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lin
2016-05-26 22:51:43 +0800

24 May, 2016

1 commit

1f40c4957 Merge tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"The bulk of this update was stabilized before the merge window and
appeared in -next. The "device dax" implementation was revised this
week in response to review feedback, and to address failures detected
by the recently expanded ndctl unit test suite.

Not included in this pull request are two dax topic branches (dax
error handling, and dax radix-tree locking). These topics were
deferred to get a few more days of -next integration testing, and to
coordinate a branch baseline with Ted and the ext4 tree. Vishal and
Ross will send the error handling and locking topics respectively in
the next few days.

This branch has received a positive build result from the kbuild robot
across 226 configs.

Summary:

- Device DAX for persistent memory: Device DAX is the device-centric
analogue of Filesystem DAX (CONFIG_FS_DAX). It allows memory
ranges to be allocated and mapped without need of an intervening
file system. Device DAX is strict, precise and predictable.
Specifically this interface:

a) Guarantees fault granularity with respect to a given page size
(pte, pmd, or pud) set at configuration time.

b) Enforces deterministic behavior by being strict about what
fault scenarios are supported.

Persistent memory is the first target, but the mechanism is also
targeted for exclusive allocations of performance/feature
differentiated memory ranges.

- Support for the HPE DSM (device specific method) command formats.
This enables management of these first generation devices until a
unified DSM specification materializes.

- Further ACPI 6.1 compliance with support for the common dimm
identifier format.

- Various fixes and cleanups across the subsystem"

* tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (40 commits)
libnvdimm, dax: fix deletion
libnvdimm, dax: fix alignment validation
libnvdimm, dax: autodetect support
libnvdimm: release ida resources
Revert "block: enable dax for raw block devices"
/dev/dax, core: file operations and dax-mmap
/dev/dax, pmem: direct access to persistent memory
libnvdimm: stop requiring a driver ->remove() method
libnvdimm, dax: record the specified alignment of a dax-device instance
libnvdimm, dax: reserve space to store labels for device-dax
libnvdimm, dax: introduce device-dax infrastructure
nfit: add sysfs dimm 'family' and 'dsm_mask' attributes
tools/testing/nvdimm: ND_CMD_CALL support
nfit: disable vendor specific commands
nfit: export subsystem ids as attributes
nfit: fix format interface code byte order per ACPI6.1
nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism
nfit, libnvdimm: clarify "commands" vs "_DSMs"
libnvdimm: increase max envelope size for ioctl
acpi/nfit: Add sysfs "id" for NVDIMM ID
...

Linus Torvalds
2016-05-24 02:18:01 +0800

21 May, 2016

2 commits

acc93d30d Revert "block: enable dax for raw block devices" ... Browse Code »

This reverts commit 5a023cdba50c5f5f2bc351783b3131699deb3937.

The functionality is superseded by the new "Device DAX" facility.

Cc: Jeff Moyer
Cc: Christoph Hellwig
Cc: Dave Chinner
Cc: Andrew Morton
Cc: Ross Zwisler
Cc: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2016-05-21 13:02:56 +0800
7244ad69c block/partitions/ldm.c: use generic UUID library ... Browse Code »

Instead of opencoding let's use generic UUID library functions here.

Signed-off-by: Andy Shevchenko
Cc: "Richard Russon (FlatCap)"
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Shevchenko
2016-05-21 08:58:30 +0800

18 May, 2016

3 commits

16bf83480 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits)
gitignore: fix wording
mfd: ab8500-debugfs: fix "between" in printk
memstick: trivial fix of spelling mistake on management
cpupowerutils: bench: fix "average"
treewide: Fix typos in printk
IB/mlx4: printk fix
pinctrl: sirf/atlas7: fix printk spelling
serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/
w1: comment spelling s/minmum/minimum/
Blackfin: comment spelling s/divsor/divisor/
metag: Fix misspellings in comments.
ia64: Fix misspellings in comments.
hexagon: Fix misspellings in comments.
tools/perf: Fix misspellings in comments.
cris: Fix misspellings in comments.
c6x: Fix misspellings in comments.
blackfin: Fix misspelling of 'register' in comment.
avr32: Fix misspelling of 'definitions' in comment.
treewide: Fix typos in printk
Doc: treewide : Fix typos in DocBook/filesystem.xml
...

Linus Torvalds
2016-05-18 08:05:30 +0800
24b9f0cf0 Merge branch 'for-4.7/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"On top of the core pull request, this is the drivers pull request for
this merge window. This contains:

- Switch drivers to the new write back cache API, and kill off the
flush flags. From me.

- Kill the discard support for the STEC pci-e flash driver. It's
trivially broken, and apparently unmaintained, so it's safer to
just remove it. From Jeff Moyer.

- A set of lightnvm updates from the usual suspects (Matias/Javier,
and Simon), and fixes from Arnd, Jeff Mahoney, Sagi, and Wenwei
Tao.

- A set of updates for NVMe:

- Turn the controller state management into a proper state
machine. From Christoph.

- Shuffling of code in preparation for NVMe-over-fabrics, also
from Christoph.

- Cleanup of the command prep part from Ming Lin.

- Rewrite of the discard support from Ming Lin.

- Deadlock fix for namespace removal from Ming Lin.

- Use the now exported blk-mq tag helper for IO termination.
From Sagi.

- Various little fixes from Christoph, Guilherme, Keith, Ming
Lin, Wang Sheng-Hui.

- Convert mtip32xx to use the now exported blk-mq tag iter function,
from Keith"

* 'for-4.7/drivers' of git://git.kernel.dk/linux-block: (74 commits)
lightnvm: reserved space calculation incorrect
lightnvm: rename nr_pages to nr_ppas on nvm_rq
lightnvm: add is_cached entry to struct ppa_addr
lightnvm: expose gennvm_mark_blk to targets
lightnvm: remove mgt targets on mgt removal
lightnvm: pass dma address to hardware rather than pointer
lightnvm: do not assume sequential lun alloc.
nvme/lightnvm: Log using the ctrl named device
lightnvm: rename dma helper functions
lightnvm: enable metadata to be sent to device
lightnvm: do not free unused metadata on rrpc
lightnvm: fix out of bound ppa lun id on bb tbl
lightnvm: refactor set_bb_tbl for accepting ppa list
lightnvm: move responsibility for bad blk mgmt to target
lightnvm: make nvm_set_rqd_ppalist() aware of vblks
lightnvm: remove struct factory_blks
lightnvm: refactor device ops->get_bb_tbl()
lightnvm: introduce nvm_for_each_lun_ppa() macro
lightnvm: refactor dev->online_target to global nvm_targets
lightnvm: rename nvm_targets to nvm_tgt_type
...

Linus Torvalds
2016-05-18 07:03:32 +0800
a4d1dbed0 Merge branch 'for-4.7/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block layer updates from Jens Axboe:
"This is the core block IO changes for this merge window. Nothing
earth shattering in here, it's mostly just fixes. In detail:

- Fix for a long standing issue where wrong ordering in blk-mq caused
order_to_size() to spew a warning. From Bart.

- Async discard support from Christoph. Basically just splitting our
sync interface into a submit + wait part.

- Add a cleaner interface for flagging whether a device has a write
back cache or not. We've previously overloaded blk_queue_flush()
with this, but let's make it more explicit. Drivers cleaned up and
updated in the drivers pull request. From me.

- Fix for a double check for whether IO accounting is enabled or not.
From Michael Callahan.

- Fix for the async discard from Mike Snitzer, reinstating the early
EOPNOTSUPP return if the device doesn't support discards.

- Also from Mike, export bio_inc_remaining() so dm can drop it's
private copy of it.

- From Ming Lin, add support for passing in an offset for request
payloads.

- Tag function export from Sagi, which will be used in NVMe in the
drivers pull.

- Two blktrace related fixes from Shaohua.

- Propagate NOMERGE flag when making a request from a bio, also from
Shaohua.

- An optimization to not parse cgroup paths in blk-throttle, if we
don't need to. From Shaohua"

* 'for-4.7/core' of git://git.kernel.dk/linux-block:
blk-mq: fix undefined behaviour in order_to_size()
blk-throttle: don't parse cgroup path if trace isn't enabled
blktrace: add missed mask name
blktrace: delete garbage for message trace
block: make bio_inc_remaining() interface accessible again
block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard
block: Minor blk_account_io_start usage cleanup
block: add __blkdev_issue_discard
block: remove struct bio_batch
block: copy NOMERGE flag from bio to request
block: add ability to flag write back caching on a device
blk-mq: Export tagset iter function
block: add offset in blk_add_request_payload()
writeback: Fix performance regression in wb_over_bg_thresh()

Linus Torvalds
2016-05-18 06:29:49 +0800

17 May, 2016

1 commit

a8078b1fc block: Update blkdev_dax_capable() for consistency ... Browse Code »

blkdev_dax_capable() is similar to bdev_dax_supported(), but needs
to remain as a separate interface for checking dax capability of
a raw block device.

Rename and relocate blkdev_dax_capable() to keep them maintained
consistently, and call bdev_direct_access() for the dax capability
check.

There is no change in the behavior.

Link: https://lkml.org/lkml/2016/5/9/950
Signed-off-by: Toshi Kani
Reviewed-by: Jan Kara
Cc: Alexander Viro
Cc: Jens Axboe
Cc: Andreas Dilger
Cc: Jan Kara
Cc: Dave Chinner
Cc: Dan Williams
Cc: Ross Zwisler
Cc: Christoph Hellwig
Cc: Boaz Harrosh
Signed-off-by: Vishal Verma

Toshi Kani
2016-05-17 14:44:13 +0800

16 May, 2016

1 commit

b3a834b15 blk-mq: fix undefined behaviour in order_to_size() ... Browse Code »

When this_order variable in blk_mq_init_rq_map() becomes zero
the code incorrectly decrements the variable and passes the result
to order_to_size() helper causing undefined behaviour:

UBSAN: Undefined behaviour in block/blk-mq.c:1459:27
shift exponent 4294967295 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22

Fix the code by checking this_order variable for not having the zero
value first.

Reported-by: Meelis Roos
Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism")
Signed-off-by: Bartlomiej Zolnierkiewicz
Signed-off-by: Jens Axboe

Bartlomiej Zolnierkiewicz
2016-05-16 23:54:47 +0800

11 May, 2016

1 commit

e4d35be58 Merge branch 'ovl-fixes' into for-linus Browse Code »

Al Viro
2016-05-11 12:00:29 +0800

10 May, 2016

1 commit

59fa0224c blk-throttle: don't parse cgroup path if trace isn't enabled ... Browse Code »

if trace isn't enabled, parsing cgroup path just wastes cpu

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2016-05-10 22:41:37 +0800

06 May, 2016

2 commits

0ef5a50c1 block: make bio_inc_remaining() interface accessible again ... Browse Code »

Commit 326e1dbb57 ("block: remove management of bi_remaining when
restoring original bi_end_io") made bio_inc_remaining() private to bio.c
because the only use-case that made sense was confined to the
bio_chain() interface.

Since that time DM thinp went on to use bio_chain() in its relatively
complex implementation of async discard support. That implementation,
even when converted over to use the new async __blkdev_issue_discard()
interface, depends on deferred completion of the original discard bio --
which is most appropriately implemented using bio_inc_remaining().

DM thinp foolishly duplicated bio_inc_remaining(), local to dm-thin.c as
__bio_inc_remaining(), so re-exporting bio_inc_remaining() allows us to
put an end to that foolishness.

All said, bio_inc_remaining() should really only be used in conjunction
with bio_chain(). It isn't intended for generic bio reference counting.

Signed-off-by: Mike Snitzer
Acked-by: Joe Thornber
Signed-off-by: Jens Axboe

Mike Snitzer
2016-05-06 03:03:29 +0800
bbd848e0f block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard ... Browse Code »

Commit 38f25255330 ("block: add __blkdev_issue_discard") incorrectly
disallowed the early return of -EOPNOTSUPP if the device doesn't support
discard (or secure discard). This early return of -EOPNOTSUPP has
always been part of blkdev_issue_discard() interface so there isn't a
good reason to break that behaviour -- especially when it can be easily
reinstated.

The nuance of allowing early return of -EOPNOTSUPP vs disallowing late
return of -EOPNOTSUPP is: if the overall device never advertised support
for discards and one is issued to the device it is beneficial to inform
the caller that discards are not supported via -EOPNOTSUPP. But if a
device advertises discard support it means that at least a subset of the
device does have discard support -- but it could be that discards issued
to some regions of a stacked device will not be supported. In that case
the late return of -EOPNOTSUPP must be disallowed.

Fixes: 38f25255330 ("block: add __blkdev_issue_discard")
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2016-05-06 03:03:26 +0800

03 May, 2016

1 commit

a21f2a3ec block: Minor blk_account_io_start usage cleanup ... Browse Code »

blk_account_io_start does not need to be wrapped with blk_do_io_stat
ais it already checks for that condition.

Signed-off-by: Michael Callahan
Signed-off-by: Jens Axboe

Michael Callahan
2016-05-03 23:26:58 +0800

02 May, 2016

2 commits

38f252553 block: add __blkdev_issue_discard ... Browse Code »

This is a version of blkdev_issue_discard which doesn't wait for
the I/O to complete, but instead allows the caller to submit
the final bio and/or chain it to others.

Signed-off-by: Christoph Hellwig
Signed-off-by: Ming Lin
Signed-off-by: Sagi Grimberg
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-05-02 23:19:46 +0800
9082e87bf block: remove struct bio_batch ... Browse Code »

It can be replaced with a combination of bio_chain and submit_bio_wait.

Signed-off-by: Christoph Hellwig
Signed-off-by: Ming Lin
Signed-off-by: Sagi Grimberg
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-05-02 23:19:43 +0800

18 Apr, 2016

1 commit

c19ca6cb4 treewide: Fix typos in printk ... Browse Code »

This patch fix spelling typos found in printk
within various part of the kernel sources.

Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina

Masanari Iida
2016-04-18 17:23:24 +0800

16 Apr, 2016

1 commit

2e5725991 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A few fixes for the current series. This contains:

- Two fixes for NVMe:

One fixes a reset race that can be triggered by repeated
insert/removal of the module.

The other fixes an issue on some platforms, where we get probe
timeouts since legacy interrupts isn't working. This used not to
be a problem since we had the worker thread poll for completions,
but since that was killed off, it means those poor souls can't
successfully probe their NVMe device. Use a proper IRQ check and
probe (msi-x -> msi ->legacy), like most other drivers to work
around this. Both from Keith.

- A loop corruption issue with offset in iters, from Ming Lei.

- A fix for not having the partition stat per cpu ref count
initialized before sending out the KOBJ_ADD, which could cause user
space to access the counter prior to initialization. Also from
Ming Lei.

- A fix for using the wrong congestion state, from Kaixu Xia"

* 'for-linus' of git://git.kernel.dk/linux-block:
block: loop: fix filesystem corruption in case of aio/dio
NVMe: Always use MSI/MSI-x interrupts
NVMe: Fix reset/remove race
writeback: fix the wrong congested state variable definition
block: partition: initialize percpuref before sending out KOBJ_ADD

Linus Torvalds
2016-04-16 06:44:10 +0800

14 Apr, 2016

1 commit

c888a8f95 block: kill off q->flush_flags ... Browse Code »

Now that we converted everything to the newer block write cache
interface, kill off the queue flush_flags and queueable flush
entries.

Signed-off-by: Jens Axboe

Jens Axboe
2016-04-14 03:33:19 +0800

13 Apr, 2016

6 commits

2245f6de6 block: kill blk_queue_flush() ... Browse Code »

We don't have any drivers left using it, so kill it off. Update
documentation to use the newer blk_queue_write_cache().

Signed-off-by: Jens Axboe
Reviewed-by: Christoph Hellwig

Jens Axboe
2016-04-13 06:00:39 +0800
2f9a0b33a Merge branch 'for-4.7/core' into for-4.7/drivers Browse Code »

Jens Axboe
2016-04-13 05:46:35 +0800
93e9d8e83 block: add ability to flag write back caching on a device ... Browse Code »

Add an internal helper and flag for setting whether a queue has
write back caching, or write through (or none). Add a sysfs file
to show this as well, and make it changeable from user space.

This will replace the (awkward) blk_queue_flush() interface that
drivers currently use to inform the block layer of write cache state
and capabilities.

Signed-off-by: Jens Axboe
Reviewed-by: Christoph Hellwig

Jens Axboe
2016-04-13 05:46:27 +0800
e8f1e1630 blk-mq: Make blk_mq_all_tag_busy_iter static ... Browse Code »

No caller outside the blk-mq code so we can settle
with it static.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Sagi Grimberg
2016-04-13 05:07:36 +0800
e0489487e blk-mq: Export tagset iter function ... Browse Code »

Its useful to iterate on all the active tags in cases
where we will need to fail all the queues IO.

Signed-off-by: Sagi Grimberg
[hch: carefully check for valid tagsets]
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Sagi Grimberg
2016-04-13 03:43:53 +0800
37e58237a block: add offset in blk_add_request_payload() ... Browse Code »

We could kmalloc() the payload, so need the offset in page.

Signed-off-by: Ming Lin
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lin
2016-04-13 03:13:23 +0800

09 Apr, 2016

1 commit

357f435d8 fix the copy vs. map logics in blk_rq_map_user_iov() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-04-09 07:46:28 +0800

05 Apr, 2016

2 commits

ea1754a08 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage ... Browse Code »

Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800
09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

30 Mar, 2016

1 commit

b30a337ca block: partition: initialize percpuref before sending out KOBJ_ADD ... Browse Code »

The initialization of partition's percpu_ref should have been done before
sending out KOBJ_ADD uevent, which may cause userspace to read partition
table. So the uninitialized percpu_ref may be accessed in data path.

This patch fixes this issue reported by Naveen.

Reported-by: Naveen Kaje
Tested-by: Naveen Kaje
Fixes: 6c71013ecb7e2(block: partition: convert percpu ref)
Cc: # v4.3+
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2016-03-30 09:18:14 +0800

25 Mar, 2016

1 commit

1d02369db Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"Final round of fixes for this merge window - some of this has come up
after the initial pull request, and some of it was put in a post-merge
branch before the merge window.

This contains:

- Fix for a bad check for an error on dma mapping in the mtip32xx
driver, from Alexey Khoroshilov.

- A set of fixes for lightnvm, from Javier, Matias, and Wenwei.

- An NVMe completion record corruption fix from Marta, ensuring that
we read things in the right order.

- Two writeback fixes from Tejun, marked for stable@ as well.

- A blk-mq sw queue iterator fix from Thomas, fixing an oops for
sparse CPU maps. They hit this in the hot plug/unplug rework"

* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: avoid cqe corruption when update at the same time as read
writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
blk-mq: Use proper cpumask iterator
mtip32xx: fix checks for dma mapping errors
lightnvm: do not load L2P table if not supported
lightnvm: do not reserve lun on l2p loading
nvme: lightnvm: return ppa completion status
lightnvm: add a bitmap of luns
lightnvm: specify target's logical address area
null_blk: add lightnvm null_blk device to the nullb_list

Linus Torvalds
2016-03-25 11:00:44 +0800

20 Mar, 2016

1 commit

897bb0c7f blk-mq: Use proper cpumask iterator ... Browse Code »

queue_for_each_ctx() iterates over per_cpu variables under the assumption that
the possible cpu mask cannot have holes. That's wrong as all cpumasks can have
holes. In case there are holes the iteration ends up accessing uninitialized
memory and crashing as a result.

Replace the macro by a proper for_each_possible_cpu() loop and drop the unused
macro blk_ctx_sum() which references queue_for_each_ctx().

Reported-by: Xiong Zhou
Signed-off-by: Thomas Gleixner
Signed-off-by: Jens Axboe

Thomas Gleixner
2016-03-20 23:34:02 +0800

19 Mar, 2016

2 commits

fcab86add Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata ... Browse Code »

Pull libata updates from Tejun Heo:

- ahci grew runtime power management support so that the controller can
be turned off if no devices are attached.

- sata_via isn't dead yet. It got hotplug support and more refined
workaround for certain WD drives.

- Misc cleanups. There's a merge from for-4.5-fixes to avoid confusing
conflicts in ahci PCI ID table.

* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: ahci_xgene: dereferencing uninitialized pointer in probe
AHCI: Remove obsolete Intel Lewisburg SATA RAID device IDs
ata: sata_rcar: Use ARCH_RENESAS
sata_via: Implement hotplug for VT6421
sata_via: Apply WD workaround only when needed on VT6421
ahci: Add runtime PM support for the host controller
ahci: Add functions to manage runtime PM of AHCI ports
ahci: Convert driver to use modern PM hooks
ahci: Cache host controller version
scsi: Drop runtime PM usage count after host is added
scsi: Set request queue runtime PM status back to active on resume
block: Add blk_set_runtime_active()
ata: ahci_mvebu: add support for Armada 3700 variant
libata: fix unbalanced spin_lock_irqsave/spin_unlock_irq() in ata_scsi_park_show()
libata: support AHCI on OCTEON platform

Linus Torvalds
2016-03-19 11:06:46 +0800
35d88d97b Merge branch 'for-4.6/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"Here are the core block changes for this merge window. Not a lot of
exciting stuff going on in this round, most of the changes have been
on the driver side of things. That pull request is coming next. This
pull request contains:

- A set of fixes for chained bio handling from Christoph.

- A tag bounds check for blk-mq from Hannes, ensuring that we don't
do something stupid if a device reports an invalid tag value.

- A set of fixes/updates for the CFQ IO scheduler from Jan Kara.

- A set of blk-mq fixes from Keith, adding support for dynamic
hardware queues, and fixing init of max_dev_sectors for stacking
devices.

- A fix for the dynamic hw context from Ming.

- Enabling of cgroup writeback support on a block device, from
Shaohua"

* 'for-4.6/core' of git://git.kernel.dk/linux-block:
blk-mq: add bounds check on tag-to-rq conversion
block: bio_remaining_done() isn't unlikely
block: cleanup bio_endio
block: factor out chained bio completion
block: don't unecessarily clobber bi_error for chained bios
block-dev: enable writeback cgroup support
blk-mq: Fix NULL pointer updating nr_requests
blk-mq: mark request queue as mq asap
block: Initialize max_dev_sectors to 0
blk-mq: dynamic h/w context count
cfq-iosched: Allow parent cgroup to preempt its child
cfq-iosched: Allow sync noidle workloads to preempt each other
cfq-iosched: Reorder checks in cfq_should_preempt()
cfq-iosched: Don't group_idle if cfqq has big thinktime

Linus Torvalds
2016-03-19 07:43:11 +0800

17 Mar, 2016

1 commit

6968e6f83 Merge tag 'dm-4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm ... Browse Code »

Pull device mapper updates from Mike Snitzer:

- Most attention this cycle went to optimizing blk-mq request-based DM
(dm-mq) that is used exclussively by DM multipath:

- A stable fix for dm-mq that eliminates excessive context
switching offers the biggest performance improvement (for both
IOPs and throughput).

- But more work is needed, during the next cycle, to reduce
spinlock contention in DM multipath on large NUMA systems.

- A stable fix for a NULL pointer seen when DM stats is enabled on a DM
multipath device that must requeue an IO due to path failure.

- A stable fix for DM snapshot to disallow the COW and origin devices
from being identical. This amounts to graceful failure in the face
of userspace error because these devices shouldn't ever be identical.

- Stable fixes for DM cache and DM thin provisioning to address crashes
seen if/when their respective metadata device experiences failures
that cause the transition to 'fail_io' mode.

- The DM cache 'mq' policy is now an alias for the 'smq' policy. The
'smq' policy proved to be consistently better than 'mq'. As such
'mq', with all its complex user-facing tunables, has been eliminated.

- Improve DM thin provisioning to consistently return -ENOSPC once the
thin-pool's data volume is out of space.

- Improve DM core to properly handle error propagation if
bio_integrity_clone() fails in clone_bio().

- Other small cleanups and improvements to DM core.

* tag 'dm-4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (41 commits)
dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request()
dm thin: consistently return -ENOSPC if pool has run out of data space
dm cache: bump the target version
dm cache: make sure every metadata function checks fail_io
dm: add missing newline between DM_DEBUG_BLOCK_STACK_TRACING and DM_BUFIO
dm cache policy smq: clarify that mq registration failure was for 'mq'
dm: return error if bio_integrity_clone() fails in clone_bio()
dm thin metadata: don't issue prefetches if a transaction abort has failed
dm snapshot: disallow the COW and origin devices from being identical
dm cache: make the 'mq' policy an alias for 'smq'
dm: drop unnecessary assignment of md->queue
dm: reorder 'struct mapped_device' members to fix alignment and holes
dm: remove dummy definition of 'struct dm_table'
dm: add 'dm_numa_node' module parameter
dm thin metadata: remove needless newline from subtree_dec() DMERR message
dm mpath: cleanup reinstate_path() et al based on code review
dm mpath: remove __pgpath_busy forward declaration, rename to pgpath_busy
dm mpath: switch from 'unsigned' to 'bool' for flags where appropriate
dm round robin: use percpu 'repeat_count' and 'current_path'
dm path selector: remove 'repeat_count' return from .select_path hook
...

Linus Torvalds
2016-03-17 08:26:37 +0800

16 Mar, 2016

2 commits

0d9c51a6e block: partition: add partition specific uevent callbacks for partition info ... Browse Code »

This patch has been carried in the Android tree for quite some time and
is one of the few patches required to get a mainline kernel up and
running with an exsiting Android userspace. So I wanted to submit it
for review and consideration if it should be merged.

For partitions, add new uevent parameters 'PARTN' which specifies the
partitions index in the table, and 'PARTNAME', which specifies PARTNAME
specifices the partition name of a partition device.

Android's userspace uses this for creating device node links from the
partition name and number, ie:

/dev/block/platform/soc/by-name/system
or
/dev/block/platform/soc/by-num/p1

One can see its usage here:
https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#355
and
https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#494

[john.stultz@linaro.org: dropped NPARTS and reworded commit message for context]
Signed-off-by: Dima Zavin
Signed-off-by: John Stultz
Cc: Jens Axboe
Cc: Rom Lemarchand
Cc: Android Kernel Team
Cc: Jeff Moyer
Cc:
Cc: Kees Cook
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

San Mehat
2016-03-16 07:55:16 +0800
4ee86babe blk-mq: add bounds check on tag-to-rq conversion ... Browse Code »

We need to check for a valid index before accessing the array
element to avoid accessing invalid memory regions.

Reviewed-by: Christoph Hellwig
Reviewed-by: Jeff Moyer

Modified by Jens to drop the unlikely(), and make the fall through
path be having a valid tag.

Signed-off-by: Jens Axboe

Hannes Reinecke
2016-03-16 03:03:28 +0800

14 Mar, 2016

2 commits

2b8855171 block: bio_remaining_done() isn't unlikely ... Browse Code »

We use bio chaining during most I/Os these days due to the delayed
bio splitting. Additionally XFS will start using it, and there is
a pending direct I/O rewrite also making heavy use for it. Don't
pretend it's always unlikely, and let the branch predictor do it's
job instead.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-03-14 22:55:25 +0800
ba8c6967b block: cleanup bio_endio ... Browse Code »

Replace the while loop that unecessarily checks for a NULL bio in the fast
path with a simple goto loop.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-03-14 22:55:24 +0800