Eric Lee / smarc-fsl-linux-kernel

03 Oct, 2017

4 commits

bc48f001d buffer: eliminate the need to call free_more_memory() in __getblk_slow() ... Browse Code »

Since the previous commit removed any case where grow_buffers()
would return failure due to memory allocations, we can safely
remove the case where we have to call free_more_memory() in
this function.

Since this is also the last user of free_more_memory(), kill
it off completely.

Reviewed-by: Nikolay Borisov
Reviewed-by: Jan Kara
Signed-off-by: Jens Axboe

Jens Axboe
2017-10-03 22:38:17 +0800
94dc24c0c buffer: grow_dev_page() should use __GFP_NOFAIL for all cases ... Browse Code »

We currently use it for find_or_create_page(), which means that it
cannot fail. Ensure we also pass in 'retry == true' to
alloc_page_buffers(), which also ensure that it cannot fail.

After this, there are no failure cases in grow_dev_page() that
occur because of a failed memory allocation.

Reviewed-by: Nikolay Borisov
Reviewed-by: Jan Kara
Signed-off-by: Jens Axboe

Jens Axboe
2017-10-03 22:38:17 +0800
640ab98fb buffer: have alloc_page_buffers() use __GFP_NOFAIL ... Browse Code »

Instead of adding weird retry logic in that function, utilize
__GFP_NOFAIL to ensure that the vm takes care of handling any
potential retries appropriately. This means we don't have to
call free_more_memory() from here.

Reviewed-by: Nikolay Borisov
Reviewed-by: Jan Kara
Signed-off-by: Jens Axboe

Jens Axboe
2017-10-03 22:38:17 +0800
7beb2f845 blk-mq: wire up completion notifier for laptop mode ... Browse Code »

For some reason, the laptop mode IO completion notified was never wired
up for blk-mq. Ensure that we trigger the callback appropriately, to arm
the laptop mode flush timer.

Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe

Jens Axboe
2017-10-03 22:38:17 +0800

01 Oct, 2017

1 commit

5385fa47d blk-mq-tag: kill unused tag enums ... Browse Code »

We don't have any notion of a tagging cache anymore, and haven't
for a long time. Kill off the unused enums.

Signed-off-by: Jens Axboe

Jens Axboe
2017-10-01 15:26:21 +0800

30 Sep, 2017

2 commits

547248736 blk-mq: remove unused function hctx_allow_merges ... Browse Code »

since 9bddeb2a5b981 "blk-mq: make per-sw-queue bio merge as default .bio_merge"
there is no caller for this function.

Reviewed-by: Ming Lei
Signed-off-by: weiping zhang
Signed-off-by: Jens Axboe

weiping zhang
2017-09-30 16:17:37 +0800
b3cffc387 null_blk: add "no_sched" module parameter ... Browse Code »

add an option that disable io scheduler for null block device.

Signed-off-by: weiping zhang
Signed-off-by: Jens Axboe

weiping zhang
2017-09-30 16:07:34 +0800

27 Sep, 2017

1 commit

0b508bc92 block: fix a build error ... Browse Code »

The code is only for blkcg not for all cgroups

Fixes: d4478e92d618 ("block/loop: make loop cgroup aware")
Reported-by: kbuild test robot
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-27 02:07:24 +0800

26 Sep, 2017

20 commits

9979d545c block: cryptoloop - Fix build warning ... Browse Code »

This patch fix the following build warning:
drivers/block/cryptoloop.c:46:8: warning: variable 'cipher' set but not used [-Wunused-but-set-variable]

Signed-off-by: Corentin Labbe
Signed-off-by: Jens Axboe

Corentin Labbe
2017-09-26 21:41:22 +0800
d4478e92d block/loop: make loop cgroup aware ... Browse Code »

loop block device handles IO in a separate thread. The actual IO
dispatched isn't cloned from the IO loop device received, so the
dispatched IO loses the cgroup context.

I'm ignoring buffer IO case now, which is quite complicated. Making the
loop thread aware cgroup context doesn't really help. The loop device
only writes to a single file. In current writeback cgroup
implementation, the file can only belong to one cgroup.

For direct IO case, we could workaround the issue in theory. For
example, say we assign cgroup1 5M/s BW for loop device and cgroup2
10M/s. We can create a special cgroup for loop thread and assign at
least 15M/s for the underlayer disk. In this way, we correctly throttle
the two cgroups. But this is tricky to setup.

This patch tries to address the issue. We record bio's css in loop
command. When loop thread is handling the command, we then use the API
provided in patch 1 to set the css for current task. The bio layer will
use the css for new IO (from patch 3).

Acked-by: Tejun Heo
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-26 21:41:22 +0800
902ec5b6d block: make blkcg aware of kthread stored original cgroup info ... Browse Code »

bio_blkcg is the only API to get cgroup info for a bio right now. If
bio_blkcg finds current task is a kthread and has original blkcg
associated, it will use the css instead of associating the bio to
current task. This makes it possible that kthread dispatches bios on
behalf of other threads.

Acked-by: Tejun Heo
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-26 21:41:22 +0800
af551fb3b blkcg: delete unused APIs ... Browse Code »

Nobody uses the APIs right now.

Acked-by: Tejun Heo
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-26 21:41:22 +0800
05e3db95e kthread: add a mechanism to store cgroup info ... Browse Code »

kthread usually runs jobs on behalf of other threads. The jobs should be
charged to cgroup of original threads. But the jobs run in a kthread,
where we lose the cgroup context of original threads. The patch adds a
machanism to record cgroup info of original threads in kthread context.
Later we can retrieve the cgroup info and attach the cgroup info to jobs.

Since this mechanism is only required by kthread, we store the cgroup
info in kthread data instead of generic task_struct.

Acked-by: Tejun Heo
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-26 21:41:22 +0800
e365806ac Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull compat fix from Al Viro:
"I really wish gcc warned about conversions from pointer to function
into void *..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a typo in put_compat_shm_info()

Linus Torvalds
2017-09-26 09:24:14 +0800
b776e4b1a fix a typo in put_compat_shm_info() ... Browse Code »

"uip" misspelled as "up"; unfortunately, the latter happens to be
a function and gcc is happy to convert it to void *...

Signed-off-by: Al Viro

Al Viro
2017-09-26 08:41:46 +0800
19240e6b2 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- Two sets of NVMe pull requests from Christoph:
- Fixes for the Fibre Channel host/target to fix spec compliance
- Allow a zero keep alive timeout
- Make the debug printk for broken SGLs work better
- Fix queue zeroing during initialization
- Set of RDMA and FC fixes
- Target div-by-zero fix

- bsg double-free fix.

- ndb unknown ioctl fix from Josef.

- Buffered vs O_DIRECT page cache inconsistency fix. Has been floating
around for a long time, well reviewed. From Lukas.

- brd overflow fix from Mikulas.

- Fix for a loop regression in this merge window, where using a union
for two members of the loop_cmd turned out to be a really bad idea.
From Omar.

- Fix for an iostat regression fix in this series, using the wrong API
to get at the block queue. From Shaohua.

- Fix for a potential blktrace delection deadlock. From Waiman.

* 'for-linus' of git://git.kernel.dk/linux-block: (30 commits)
nvme-fcloop: fix port deletes and callbacks
nvmet-fc: sync header templates with comments
nvmet-fc: ensure target queue id within range.
nvmet-fc: on port remove call put outside lock
nvme-rdma: don't fully stop the controller in error recovery
nvme-rdma: give up reconnect if state change fails
nvme-core: Use nvme_wq to queue async events and fw activation
nvme: fix sqhd reference when admin queue connect fails
block: fix a crash caused by wrong API
fs: Fix page cache inconsistency when mixing buffered and AIO DIO
nvmet: implement valid sqhd values in completions
nvme-fabrics: Allow 0 as KATO value
nvme: allow timed-out ios to retry
nvme: stop aer posting if controller state not live
nvme-pci: Print invalid SGL only once
nvme-pci: initialize queue memory before interrupts
nvmet-fc: fix failing max io queue connections
nvme-fc: use transport-specific sgl format
nvme: add transport SGL definitions
nvme.h: remove FC transport-specific error values
...

Linus Torvalds
2017-09-26 06:46:04 +0800
17763641f Merge tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 ... Browse Code »

Pull gfs2 fix from Bob Peterson:
"GFS2: Fix an old regression in GFS2's debugfs interface

This fixes a regression introduced by commit 88ffbf3e037e ("GFS2: Use
resizable hash table for glocks"). The regression caused the glock dump
in debugfs to not report all the glocks, which makes debugging
extremely difficult"

* tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix debugfs glocks dump

Linus Torvalds
2017-09-26 06:41:56 +0800
cf0346161 Merge tag 'microblaze-4.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze ... Browse Code »

Pull Microblaze fixes from Michal Simek:

- Kbuild fix

- use vma_pages

- setup default little endians

* tag 'microblaze-4.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze:
arch: change default endian for microblaze
microblaze: Cocci spatch "vma_pages"
microblaze: Add missing kvm_para.h to Kbuild

Linus Torvalds
2017-09-26 06:37:19 +0800
ac0a36461 Merge tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing fixes from Steven Rostedt:
"Stack tracing and RCU has been having issues with each other and
lockdep has been pointing out constant problems.

The changes have been going into the stack tracer, but it has been
discovered that the problem isn't with the stack tracer itself, but it
is with calling save_stack_trace() from within the internals of RCU.

The stack tracer is the one that can trigger the issue the easiest,
but examining the problem further, it could also happen from a WARN()
in the wrong place, or even if an NMI happened in this area and it did
an rcu_read_lock().

The critical area is where RCU is not watching. Which can happen while
going to and from idle, or bringing up or taking down a CPU.

The final fix was to put the protection in kernel_text_address() as it
is the one that requires RCU to be watching while doing the stack
trace.

To make this work properly, Paul had to allow rcu_irq_enter() happen
after rcu_nmi_enter(). This should have been done anyway, since an NMI
can page fault (reading vmalloc area), and a page fault triggers
rcu_irq_enter().

One patch is just a consolidation of code so that the fix only needed
to be done in one location"

* tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Remove RCU work arounds from stack tracer
extable: Enable RCU if it is not watching in kernel_text_address()
extable: Consolidate *kernel_text_address() functions
rcu: Allow for page faults in NMI handlers

Linus Torvalds
2017-09-26 06:22:31 +0800
fddc9923c nvme-fcloop: fix port deletes and callbacks ... Browse Code »

Now that there are potentially long delays between when a remoteport or
targetport delete calls is made and when the callback occurs (dev_loss_tmo
timeout), no longer block in the delete routines and move the final nport
puts to the callbacks.

Moved the fcloop_nport_get/put/free routines to avoid forward declarations.

Ensure port_info structs used in registrations are nulled in case fields
are not set (ex: devloss_tmo values).

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-26 02:42:11 +0800
6b71f9e1e nvmet-fc: sync header templates with comments ... Browse Code »

Comments were incorrect:
- defer_rcv was in host port template. moved to target port template
- Added Mandatory statements for target port template items

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-26 02:42:11 +0800
0c319d3a1 nvmet-fc: ensure target queue id within range. ... Browse Code »

When searching for queue id's ensure they are within the expected range.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-26 02:42:11 +0800
3688feb58 nvmet-fc: on port remove call put outside lock ... Browse Code »

Avoid calling the put routine, as it may traverse to free routines while
holding the target lock.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-26 02:42:11 +0800
e4d753d7e nvme-rdma: don't fully stop the controller in error recovery ... Browse Code »

By calling nvme_stop_ctrl on a already failed controller will wait for the
scan work to complete (only by identify timeout expiration which is 60
seconds). This is unnecessary when we already know that the controller has
failed.

Reported-by: Yi Zhang
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Sagi Grimberg
2017-09-26 02:42:11 +0800
0a960afd6 nvme-rdma: give up reconnect if state change fails ... Browse Code »

If we failed to transition to state LIVE after a successful reconnect,
then controller deletion already started. In this case there is no
point moving forward with reconnect.

Reviewed-by: Johannes Thumshirn
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Sagi Grimberg
2017-09-26 02:42:11 +0800
1a40d9728 nvme-core: Use nvme_wq to queue async events and fw activation ... Browse Code »

async_event_work might race as it is executed from two different
workqueues at the moment.

Reviewed-by: Johannes Thumshirn
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Sagi Grimberg
2017-09-26 02:42:11 +0800
8cbd96a62 nvme: fix sqhd reference when admin queue connect fails ... Browse Code »

Fix bug in sqhd patch.

It wasn't the sq that was at risk. In the case where the admin queue
connect command fails, the sq->size field is not set. Therefore, this
becomes a divide by zero error.

Add a quick check to bypass under this failure condition.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-26 02:42:11 +0800
10201655b gfs2: Fix debugfs glocks dump ... Browse Code »

The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock
dump (/sys/kernel/debug/gfs2//glocks) for dumps bigger than a
single buffer: the right function for restarting an rhashtable iteration
from the beginning of the hash table is rhashtable_walk_enter;
rhashtable_walk_stop + rhashtable_walk_start will just resume from the
current position.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson
Cc: stable@vger.kernel.org # v4.3+

Andreas Gruenbacher
2017-09-26 01:32:33 +0800

25 Sep, 2017

12 commits

f5c156c4c block: fix a crash caused by wrong API ... Browse Code »

part_stat_show takes a part device not a disk, so we should use
part_to_disk.

Fixes: d62e26b3ffd2("block: pass in queue to inflight accounting")
Cc: Bart Van Assche
Cc: Omar Sandoval
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-25 22:56:05 +0800
332391a99 fs: Fix page cache inconsistency when mixing buffered and AIO DIO ... Browse Code »

Currently when mixing buffered reads and asynchronous direct writes it
is possible to end up with the situation where we have stale data in the
page cache while the new data is already written to disk. This is
permanent until the affected pages are flushed away. Despite the fact
that mixing buffered and direct IO is ill-advised it does pose a thread
for a data integrity, is unexpected and should be fixed.

Fix this by deferring completion of asynchronous direct writes to a
process context in the case that there are mapped pages to be found in
the inode. Later before the completion in dio_complete() invalidate
the pages in question. This ensures that after the completion the pages
in the written area are either unmapped, or populated with up-to-date
data. Also do the same for the iomap case which uses
iomap_dio_complete() instead.

This has a side effect of deferring the completion to a process context
for every AIO DIO that happens on inode that has pages mapped. However
since the consensus is that this is ill-advised practice the performance
implication should not be a problem.

This was based on proposal from Jeff Moyer, thanks!

Reviewed-by: Jan Kara
Reviewed-by: Darrick J. Wong
Reviewed-by: Jeff Moyer
Signed-off-by: Lukas Czerner
Signed-off-by: Jens Axboe

Lukas Czerner
2017-09-25 22:56:05 +0800
bb1cc7479 nvmet: implement valid sqhd values in completions ... Browse Code »

To support sqhd, for initiators that are following the spec and
paying attention to sqhd vs their sqtail values:

- add sqhd to struct nvmet_sq
- initialize sqhd to 0 in nvmet_sq_setup
- rather than propagate the 0's-based qsize value from the connect message
which requires a +1 in every sqhd update, and as nothing else references
it, convert to 1's-based value in nvmt_sq/cq_setup() calls.
- validate connect message sqsize being non-zero per spec.
- updated assign sqhd for every completion that goes back.

Also remove handling the NULL sq case in __nvmet_req_complete, as it can't
happen with the current code.

Signed-off-by: James Smart
Reviewed-by: Sagi Grimberg
Reviewed-by: Max Gurtovoy
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-25 22:56:05 +0800
8edd11c9a nvme-fabrics: Allow 0 as KATO value ... Browse Code »

Currently, driver code allows user to set 0 as KATO
(Keep Alive TimeOut), but this is not being respected.
This patch enforces the expected behavior.

Signed-off-by: Guilherme G. Piccoli
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Guilherme G. Piccoli
2017-09-25 22:56:05 +0800
0951338d9 nvme: allow timed-out ios to retry ... Browse Code »

Currently the nvme_req_needs_retry() applies several checks to see if
a retry is allowed. On of those is whether the current time has exceeded
the start time of the io plus the timeout length. This check, if an io
times out, means there is never a retry allowed for the io. Which means
applications see the io failure.

Remove this check and allow the io to timeout, like it does on other
protocols, and retries to be made.

On the FC transport, a frame can be lost for an individual io, and there
may be no other errors that escalate for the connection/association.
The io will timeout, which causes the transport to escalate into creating
a new association, but the io that timed out, due to this retry logic, has
already failed back to the application and things are hosed.

Signed-off-by: James Smart
Reviewed-by: Keith Busch
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-25 22:56:05 +0800
cd48282cc nvme: stop aer posting if controller state not live ... Browse Code »

If an nvme async_event command completes, in most cases, a new
async event is posted. However, if the controller enters a
resetting or reconnecting state, there is nothing to block the
scheduled work element from posting the async event again. Nor are
there calls from the transport to stop async events when an
association dies.

In the case of FC, where the association is torn down, the aer must
be aborted on the FC link and completes through the normal job
completion path. Thus the terminated async event ends up being
rescheduled even though the controller isn't in a valid state for
the aer, and the reposting gets the transport into a partially torn
down data structure.

It's possible to hit the scenario on rdma, although much less likely
due to an aer completing right as the association is terminated and
as the association teardown reclaims the blk requests via
nvme_cancel_request() so its immediate, not a link-related action
like on FC.

Fix by putting controller state checks in both the async event
completion routine where it schedules the async event and in the
async event work routine before it calls into the transport. It's
effectively a "stop_async_events()" behavior. The transport, when
it creates a new association with the subsystem will transition
the state back to live and is already restarting the async event
posting.

Signed-off-by: James Smart
[hch: remove taking a lock over reading the controller state]
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-25 22:56:05 +0800
d08774738 nvme-pci: Print invalid SGL only once ... Browse Code »

The WARN_ONCE macro returns true if the condition is true, not if the
warn was raised, so we're printing the scatter list every time it's
invalid. This is excessive and makes debugging harder, so this patch
prints it just once.

Signed-off-by: Keith Busch
Reviewed-by: Martin K. Petersen
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Keith Busch
2017-09-25 22:56:05 +0800
161b8be2b nvme-pci: initialize queue memory before interrupts ... Browse Code »

A spurious interrupt before the nvme driver has initialized the completion
queue may inadvertently cause the driver to believe it has a completion
to process. This may result in a NULL dereference since the nvmeq's tags
are not set at this point.

The patch initializes the host's CQ memory so that a spurious interrupt
isn't mistaken for a real completion.

Signed-off-by: Keith Busch
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Keith Busch
2017-09-25 22:56:05 +0800
deb61742e nvmet-fc: fix failing max io queue connections ... Browse Code »

fc transport is treating NVMET_NR_QUEUES as maximum queue count, e.g.
admin queue plus NVMET_NR_QUEUES-1 io queues. But NVMET_NR_QUEUES is
the number of io queues, so maximum queue count is really
NVMET_NR_QUEUES+1.

Fix the handling in the target fc transport

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-25 22:56:05 +0800
d9d34c0b2 nvme-fc: use transport-specific sgl format ... Browse Code »

Sync with NVM Express spec change and FC-NVME 1.18.

FC transport sets SGL type to Transport SGL Data Block Descriptor and
subtype to transport-specific value 0x0A.

Removed the warn-on's on the PRP fields. They are unneeded. They were
to check for values from the upper layer that weren't set right, and
for the most part were fine. But, with Async events, which reuse the
same structure and 2nd time issued the SGL overlay converted them to
the Transport SGL values - the warn-on's were errantly firing.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-25 22:56:05 +0800
d85cf2074 nvme: add transport SGL definitions ... Browse Code »

Add transport SGL defintions from NVMe TP 4008, required for
the final NVMe-FC standard.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-25 22:56:05 +0800
c98cb3bd8 nvme.h: remove FC transport-specific error values ... Browse Code »

The NVM express group recinded the reserved range for the transport.
Remove the FC-centric values that had been defined.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

James Smart
2017-09-25 22:56:05 +0800