Eric Lee / smarc-fsl-linux-kernel

20 Oct, 2020

1 commit

694565356 Merge tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse updates from Miklos Szeredi:

- Support directly accessing host page cache from virtiofs. This can
improve I/O performance for various workloads, as well as reducing
the memory requirement by eliminating double caching. Thanks to Vivek
Goyal for doing most of the work on this.

- Allow automatic submounting inside virtiofs. This allows unique
st_dev/ st_ino values to be assigned inside the guest to files
residing on different filesystems on the host. Thanks to Max Reitz
for the patches.

- Fix an old use after free bug found by Pradeep P V K.

* tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
virtiofs: calculate number of scatter-gather elements accurately
fuse: connection remove fix
fuse: implement crossmounts
fuse: Allow fuse_fill_super_common() for submounts
fuse: split fuse_mount off of fuse_conn
fuse: drop fuse_conn parameter where possible
fuse: store fuse_conn in fuse_req
fuse: add submount support to
fuse: fix page dereference after free
virtiofs: add logic to free up a memory range
virtiofs: maintain a list of busy elements
virtiofs: serialize truncate/punch_hole and dax fault path
virtiofs: define dax address space operations
virtiofs: add DAX mmap support
virtiofs: implement dax read/write operations
virtiofs: introduce setupmapping/removemapping commands
virtiofs: implement FUSE_INIT map_alignment field
virtiofs: keep a list of free dax memory ranges
virtiofs: add a mount option to enable dax
virtiofs: set up virtio_fs dax_device
...

Linus Torvalds
2020-10-20 05:28:30 +0800

21 Sep, 2020

1 commit

81ee8e52a iomap: Change calling convention for zeroing ... Browse Code »

Pass the full length to iomap_zero() and dax_iomap_zero(), and have
them return how many bytes they actually handled. This is preparatory
work for handling THP, although it looks like DAX could actually take
advantage of it if there's a larger contiguous area.

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Matthew Wilcox (Oracle)
2020-09-21 23:59:27 +0800

10 Sep, 2020

1 commit

6bbdd563e dax: Create a range version of dax_layout_busy_page() ... Browse Code »

virtiofs device has a range of memory which is mapped into file inodes
using dax. This memory is mapped in qemu on host and maps different
sections of real file on host. Size of this memory is limited
(determined by administrator) and depending on filesystem size, we will
soon reach a situation where all the memory is in use and we need to
reclaim some.

As part of reclaim process, we will need to make sure that there are
no active references to pages (taken by get_user_pages()) on the memory
range we are trying to reclaim. I am planning to use
dax_layout_busy_page() for this. But in current form this is per inode
and scans through all the pages of the inode.

We want to reclaim only a portion of memory (say 2MB page). So we want
to make sure that only that 2MB range of pages do not have any
references (and don't want to unmap all the pages of inode).

Hence, create a range version of this function named
dax_layout_busy_page_range() which can be used to pass a range which
needs to be unmapped.

Cc: Dan Williams
Cc: linux-nvdimm@lists.01.org
Cc: Jan Kara
Cc: Vishal L Verma
Cc: "Weiny, Ira"
Signed-off-by: Vivek Goyal
Reviewed-by: Jan Kara
Signed-off-by: Miklos Szeredi

Vivek Goyal
2020-09-10 17:39:22 +0800

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

31 Jul, 2020

1 commit

49688e654 dax: Fix incorrect argument passed to xas_set_err() ... Browse Code »

The argument passed to xas_set_err() to indicate an error should be negative.
Otherwise, xas_error() will return 0, and grab_mapping_entry() will return the
found entry instead of 'SIGBUS' when the entry is not in fact valid.
This would result in problems in subsequent code paths.

Link: https://lore.kernel.org/r/20200729034436.24267-1-lihao2018.fnst@cn.fujitsu.com
Reviewed-by: Pankaj Gupta
Signed-off-by: Hao Li
Signed-off-by: Vishal Verma

Hao Li
2020-07-31 08:14:33 +0800

29 Jul, 2020

1 commit

c7fe193f1 fs/dax: Remove unused size parameter ... Browse Code »

Passing size to copy_user_dax implies it can copy variable sizes of data
when in fact it calls copy_user_page() which is exactly a page.

We are safe because the only caller uses PAGE_SIZE anyway so just remove
the variable for clarity.

While we are at it change copy_user_dax() to copy_cow_page_dax() to make
it clear it is a singleton helper for this one case not implementing
what dax_iomap_actor() does.

Link: https://lore.kernel.org/r/20200717072056.73134-11-ira.weiny@intel.com
Reviewed-by: Ben Widawsky
Reviewed-by: Dan Williams
Signed-off-by: Ira Weiny
Signed-off-by: Vishal Verma

Ira Weiny
2020-07-29 01:49:29 +0800

03 Apr, 2020

2 commits

4f3b4f161 dax,iomap: Add helper dax_iomap_zero() to zero a range ... Browse Code »

Add a helper dax_ioamp_zero() to zero a range. This patch basically
merges __dax_zero_page_range() and iomap_dax_zero().

Suggested-by: Christoph Hellwig
Signed-off-by: Vivek Goyal
Reviewed-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20200228163456.1587-7-vgoyal@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2020-04-03 10:15:03 +0800
0a23f9ffa dax: Use new dax zero page method for zeroing a page ... Browse Code »

Use new dax native zero page method for zeroing page if I/O is page
aligned. Otherwise fall back to direct_access() + memcpy().

This gets rid of one of the depenendency on block device in dax path.

Signed-off-by: Vivek Goyal
Link: https://lore.kernel.org/r/20200228163456.1587-6-vgoyal@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2020-04-03 10:15:03 +0800

06 Feb, 2020

1 commit

96222d538 dax: pass NOWAIT flag to iomap_apply ... Browse Code »

fstests generic/471 reports a failure when run with MOUNT_OPTIONS="-o
dax". The reason is that the initial pwrite to an empty file with the
RWF_NOWAIT flag set does not return -EAGAIN. It turns out that
dax_iomap_rw doesn't pass that flag through to iomap_apply.

With this patch applied, generic/471 passes for me.

Signed-off-by: Jeff Moyer
Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/x49r1z86e1d.fsf@segfault.boston.devel.redhat.com
Signed-off-by: Dan Williams

Jeff Moyer
2020-02-06 12:34:32 +0800

04 Jan, 2020

1 commit

3f666c56c dax: Pass dax_dev instead of bdev to dax_writeback_mapping_range() ... Browse Code »

As of now dax_writeback_mapping_range() takes "struct block_device" as a
parameter and dax_dev is searched from bdev name. This also involves taking
a fresh reference on dax_dev and putting that reference at the end of
function.

We are developing a new filesystem virtio-fs and using dax to access host
page cache directly. But there is no block device. IOW, we want to make
use of dax but want to get rid of this assumption that there is always
a block device associated with dax_dev.

So pass in "struct dax_device" as parameter instead of bdev.

ext2/ext4/xfs are current users and they already have a reference on
dax_device. So there is no need to take reference and drop reference to
dax_device on each call of this function.

Suggested-by: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Vivek Goyal
Link: https://lore.kernel.org/r/20200103183307.GB13350@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2020-01-04 03:13:12 +0800

01 Dec, 2019

1 commit

3b266a52d Merge tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull iomap updates from Darrick Wong:
"In this release, we hoisted as much of XFS' writeback code into iomap
as was practicable, refactored the unshare file data function, added
the ability to perform buffered io copy on write, and tweaked various
parts of the directio implementation as needed to port ext4's directio
code (that will be a separate pull).

Summary:

- Make iomap_dio_rw callers explicitly tell us if they want us to
wait

- Port the xfs writeback code to iomap to complete the buffered io
library functions

- Refactor the unshare code to share common pieces

- Add support for performing copy on write with buffered writes

- Other minor fixes

- Fix unchecked return in iomap_bmap

- Fix a type casting bug in a ternary statement in
iomap_dio_bio_actor

- Improve tracepoints for easier diagnostic ability

- Fix pipe page leakage in directio reads"

* tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (31 commits)
iomap: Fix pipe page leakage during splicing
iomap: trace iomap_appply results
iomap: fix return value of iomap_dio_bio_actor on 32bit systems
iomap: iomap_bmap should check iomap_apply return value
iomap: Fix overflow in iomap_page_mkwrite
fs/iomap: remove redundant check in iomap_dio_rw()
iomap: use a srcmap for a read-modify-write I/O
iomap: renumber IOMAP_HOLE to 0
iomap: use write_begin to read pages to unshare
iomap: move the zeroing case out of iomap_read_page_sync
iomap: ignore non-shared or non-data blocks in xfs_file_dirty
iomap: always use AOP_FLAG_NOFS in iomap_write_begin
iomap: remove the unused iomap argument to __iomap_write_end
iomap: better document the IOMAP_F_* flags
iomap: enhance writeback error message
iomap: pass a struct page to iomap_finish_page_writeback
iomap: cleanup iomap_ioend_compare
iomap: move struct iomap_page out of iomap.h
iomap: warn on inline maps in iomap_writepage_map
iomap: lift the xfs writeback code to iomap
...

Linus Torvalds
2019-12-01 02:44:49 +0800

23 Oct, 2019

1 commit

6370740e5 fs/dax: Fix pmd vs pte conflict detection ... Browse Code »

Users reported a v5.3 performance regression and inability to establish
huge page mappings. A revised version of the ndctl "dax.sh" huge page
unit test identifies commit 23c84eb78375 "dax: Fix missed wakeup with
PMD faults" as the source.

Update get_unlocked_entry() to check for NULL entries before checking
the entry order, otherwise NULL is misinterpreted as a present pte
conflict. The 'order' check needs to happen before the locked check as
an unlocked entry at the wrong order must fallback to lookup the correct
order.

Reported-by: Jeff Smits
Reported-by: Doug Nelson
Cc:
Fixes: 23c84eb78375 ("dax: Fix missed wakeup with PMD faults")
Reviewed-by: Jan Kara
Cc: Jeff Moyer
Cc: Matthew Wilcox (Oracle)
Reviewed-by: Johannes Thumshirn
Link: https://lore.kernel.org/r/157167532455.3945484.11971474077040503994.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams

Dan Williams
2019-10-23 13:53:02 +0800

21 Oct, 2019

1 commit

c039b9979 iomap: use a srcmap for a read-modify-write I/O ... Browse Code »

The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target. The srcmap is only supported for buffered writes so far.

Signed-off-by: Goldwyn Rodrigues
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Acked-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2019-10-21 23:51:59 +0800

06 Aug, 2019

1 commit

d75996dd0 dax: dax_layout_busy_page() should not unmap cow pages ... Browse Code »

Vivek:

"As of now dax_layout_busy_page() calls unmap_mapping_range() with last
argument as 1, which says even unmap cow pages. I am wondering who needs
to get rid of cow pages as well.

I noticed one interesting side affect of this. I mount xfs with -o dax and
mmaped a file with MAP_PRIVATE and wrote some data to a page which created
cow page. Then I called fallocate() on that file to zero a page of file.
fallocate() called dax_layout_busy_page() which unmapped cow pages as well
and then I tried to read back the data I wrote and what I get is old
data from persistent memory. I lost the data I had written. This
read basically resulted in new fault and read back the data from
persistent memory.

This sounds wrong. Are there any users which need to unmap cow pages
as well? If not, I am proposing changing it to not unmap cow pages.

I noticed this while while writing virtio_fs code where when I tried
to reclaim a memory range and that corrupted the executable and I
was running from virtio-fs and program got segment violation."

Dan:

"In fact the unmap_mapping_range() in this path is only to synchronize
against get_user_pages_fast() and force it to call back into the
filesystem to re-establish the mapping. COW pages should be left
untouched by dax_layout_busy_page()."

Cc:
Fixes: 5fac7408d828 ("mm, fs, dax: handle layout changes to pinned dax mappings")
Signed-off-by: Vivek Goyal
Link: https://lore.kernel.org/r/20190802192956.GA3032@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2019-08-06 05:59:05 +0800

30 Jul, 2019

1 commit

61c30c98e dax: Fix missed wakeup in put_unlocked_entry() ... Browse Code »

The condition checking whether put_unlocked_entry() needs to wake up
following waiter got broken by commit 23c84eb78375 ("dax: Fix missed
wakeup with PMD faults"). We need to wake the waiter whenever the passed
entry is valid (i.e., non-NULL and not special conflict entry). This
could lead to processes never being woken up when waiting for entry
lock. Fix the condition.

Cc:
Link: http://lore.kernel.org/r/20190729120228.GC17833@quack2.suse.cz
Fixes: 23c84eb78375 ("dax: Fix missed wakeup with PMD faults")
Signed-off-by: Jan Kara
Signed-off-by: Dan Williams

Jan Kara
2019-07-30 00:24:22 +0800

20 Jul, 2019

1 commit

26473f837 Merge tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull iomap split/cleanup from Darrick Wong:
"As promised, here's the second part of the iomap merge for 5.3, in
which we break up iomap.c into smaller files grouped by functional
area so that it'll be easier in the long run to maintain cohesiveness
of code units and to review incoming patches. There are no functional
changes and fs/iomap.c split cleanly.

Summary:

- Regroup the fs/iomap.c code by major functional area so that we can
start development for 5.4 from a more stable base"

* tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: move internal declarations into fs/iomap/
iomap: move the main iteration code into a separate file
iomap: move the buffered IO code into a separate file
iomap: move the direct IO code into a separate file
iomap: move the SEEK_HOLE code into a separate file
iomap: move the file mapping reporting code into a separate file
iomap: move the swapfile code into a separate file
iomap: start moving code to fs/iomap/

Linus Torvalds
2019-07-20 02:38:12 +0800

19 Jul, 2019

1 commit

0fe49f70a Merge tag 'dax-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull dax updates from Dan Williams:
"The fruits of a bug hunt in the fsdax implementation with Willy and a
small feature update for device-dax:

- Fix a hang condition that started triggering after the Xarray
conversion of fsdax in the v4.20 kernel.

- Add a 'resource' (root-only physical base address) sysfs attribute
to device-dax instances to correlate memory-blocks onlined via the
kmem driver with a given device instance"

* tag 'dax-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Fix missed wakeup with PMD faults
device-dax: Add a 'resource' attribute

Linus Torvalds
2019-07-19 01:58:52 +0800

17 Jul, 2019

2 commits

5d907307a iomap: move internal declarations into fs/iomap/ ... Browse Code »

Move internal function declarations out of fs/internal.h into
include/linux/iomap.h so that our transition is complete.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2019-07-17 22:21:02 +0800
23c84eb78 dax: Fix missed wakeup with PMD faults ... Browse Code »

RocksDB can hang indefinitely when using a DAX file. This is due to
a bug in the XArray conversion when handling a PMD fault and finding a
PTE entry. We use the wrong index in the hash and end up waiting on
the wrong waitqueue.

There's actually no need to wait; if we find a PTE entry while looking
for a PMD entry, we can return immediately as we know we should fall
back to a PTE fault (which may not conflict with the lock held).

We reuse the XA_RETRY_ENTRY to signal a conflicting entry was found.
This value can never be found in an XArray while holding its lock, so
it does not create an ambiguity.

Cc:
Link: http://lkml.kernel.org/r/CAPcyv4hwHpX-MkUEqxwdTj7wCCZCN4RV-L4jsnuwLGyL_UEG4A@mail.gmail.com
Fixes: b15cd800682f ("dax: Convert page fault handlers to XArray")
Signed-off-by: Matthew Wilcox (Oracle)
Tested-by: Dan Williams
Reported-by: Robert Barror
Reported-by: Seema Pandit
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Matthew Wilcox (Oracle)
2019-07-17 10:30:59 +0800

09 Jul, 2019

1 commit

e19283286 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:

- rwsem scalability improvements, phase #2, by Waiman Long, which are
rather impressive:

"On a 2-socket 40-core 80-thread Skylake system with 40 reader
and writer locking threads, the min/mean/max locking operations
done in a 5-second testing window before the patchset were:

40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255

After the patchset, they became:

40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"

There's a lot of changes to the locking implementation that makes
it similar to qrwlock, including owner handoff for more fair
locking.

Another microbenchmark shows how across the spectrum the
improvements are:

"With a locking microbenchmark running on 5.1 based kernel, the
total locking rates (in kops/s) on a 2-socket Skylake system
with equal numbers of readers and writers (mixed) before and
after this patchset were:

# of Threads Before Patch After Patch
------------ ------------ -----------
2 2,618 4,193
4 1,202 3,726
8 802 3,622
16 729 3,359
32 319 2,826
64 102 2,744"

The changes are extensive and the patch-set has been through
several iterations addressing various locking workloads. There
might be more regressions, but unless they are pathological I
believe we want to use this new implementation as the baseline
going forward.

- jump-label optimizations by Daniel Bristot de Oliveira: the primary
motivation was to remove IPI disturbance of isolated RT-workload
CPUs, which resulted in the implementation of batched jump-label
updates. Beyond the improvement of the real-time characteristics
kernel, in one test this patchset improved static key update
overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
as well.

- atomic64_t cross-arch type cleanups by Mark Rutland: over the last
~10 years of atomic64_t existence the various types used by the
APIs only had to be self-consistent within each architecture -
which means they became wildly inconsistent across architectures.
Mark puts and end to this by reworking all the atomic64
implementations to use 's64' as the base type for atomic64_t, and
to ensure that this type is consistently used for parameters and
return values in the API, avoiding further problems in this area.

- A large set of small improvements to lockdep by Yuyang Du: type
cleanups, output cleanups, function return type and othr cleanups
all around the place.

- A set of percpu ops cleanups and fixes by Peter Zijlstra.

- Misc other changes - please see the Git log for more details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
locking/lockdep: increase size of counters for lockdep statistics
locking/atomics: Use sed(1) instead of non-standard head(1) option
locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
x86/jump_label: Make tp_vec_nr static
x86/percpu: Optimize raw_cpu_xchg()
x86/percpu, sched/fair: Avoid local_clock()
x86/percpu, x86/irq: Relax {set,get}_irq_regs()
x86/percpu: Relax smp_processor_id()
x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
locking/rwsem: Guard against making count negative
locking/rwsem: Adaptive disabling of reader optimistic spinning
locking/rwsem: Enable time-based spinning on reader-owned rwsem
locking/rwsem: Make rwsem->owner an atomic_long_t
locking/rwsem: Enable readers spinning on writer
locking/rwsem: Clarify usage of owner's nonspinaable bit
locking/rwsem: Wake up almost all readers in wait queue
locking/rwsem: More optimal RT task handling of null owner
locking/rwsem: Always release wait_lock before waking up tasks
locking/rwsem: Implement lock handoff to prevent lock starvation
locking/rwsem: Make rwsem_spin_on_owner() return owner state
...

Linus Torvalds
2019-07-09 07:12:03 +0800

05 Jul, 2019

1 commit

cde357c39 Merge tag 'dax-fix-5.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull dax fix from Dan Williams:
"A single dax fix that has been soaking awaiting other fixes under
discussion to join it. As it is getting late in the cycle lets proceed
with this fix and save follow-on changes for post-v5.3-rc1.

- Fix xarray entry association for mixed mappings"

* tag 'dax-fix-5.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Fix xarray entry association for mixed mappings

Linus Torvalds
2019-07-05 10:32:11 +0800

17 Jun, 2019

1 commit

9ffbe8ac0 locking/lockdep: Rename lockdep_assert_held_exclusive() -> lockdep_assert_held_write() ... Browse Code »

All callers of lockdep_assert_held_exclusive() use it to verify the
correct locking state of either a semaphore (ldisc_sem in tty,
mmap_sem for perf events, i_rwsem of inode for dax) or rwlock by
apparmor. Thus it makes sense to rename _exclusive to _write since
that's the semantics callers care. Additionally there is already
lockdep_assert_held_read(), which this new naming is more consistent with.

No functional changes.

Signed-off-by: Nikolay Borisov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: https://lkml.kernel.org/r/20190531100651.3969-1-nborisov@suse.com
Signed-off-by: Ingo Molnar

Nikolay Borisov
2019-06-17 18:09:24 +0800

07 Jun, 2019

1 commit

1571c029a dax: Fix xarray entry association for mixed mappings ... Browse Code »

When inserting entry into xarray, we store mapping and index in
corresponding struct pages for memory error handling. When it happened
that one process was mapping file at PMD granularity while another
process at PTE granularity, we could wrongly deassociate PMD range and
then reassociate PTE range leaving the rest of struct pages in PMD range
without mapping information which could later cause missed notifications
about memory errors. Fix the problem by calling the association /
deassociation code if and only if we are really going to update the
xarray (deassociating and associating zero or empty entries is just
no-op so there's no reason to complicate the code with trying to avoid
the calls for these cases).

Cc:
Fixes: d2c997c0f145 ("fs, dax: use page->mapping to warn if truncate...")
Signed-off-by: Jan Kara
Signed-off-by: Dan Williams

Jan Kara
2019-06-07 13:18:49 +0800

05 Jun, 2019

1 commit

2025cf9e1 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 263 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Reviewed-by: Alexios Zavras
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:36:37 +0800

15 May, 2019

2 commits

024eee0e8 mm: page_mkclean vs MADV_DONTNEED race ... Browse Code »

MADV_DONTNEED is handled with mmap_sem taken in read mode. We call
page_mkclean without holding mmap_sem.

MADV_DONTNEED implies that pages in the region are unmapped and subsequent
access to the pages in that range is handled as a new page fault. This
implies that if we don't have parallel access to the region when
MADV_DONTNEED is run we expect those range to be unallocated.

w.r.t page_mkclean() we need to make sure that we don't break the
MADV_DONTNEED semantics. MADV_DONTNEED check for pmd_none without holding
pmd_lock. This implies we skip the pmd if we temporarily mark pmd none.
Avoid doing that while marking the page clean.

Keep the sequence same for dax too even though we don't support
MADV_DONTNEED for dax mapping

The bug was noticed by code review and I didn't observe any failures w.r.t
test run. This is similar to

commit 58ceeb6bec86d9140f9d91d71a710e963523d063
Author: Kirill A. Shutemov
Date: Thu Apr 13 14:56:26 2017 -0700

thp: fix MADV_DONTNEED vs. MADV_FREE race

commit ced108037c2aa542b3ed8b7afd1576064ad1362a
Author: Kirill A. Shutemov
Date: Thu Apr 13 14:56:20 2017 -0700

thp: fix MADV_DONTNEED vs. numa balancing race

Link: http://lkml.kernel.org/r/20190321040610.14226-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V
Reviewed-by: Andrew Morton
Cc: Dan Williams
Cc:"Kirill A . Shutemov"
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2019-05-15 00:47:48 +0800
fce86ff58 mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses ... Browse Code »

Starting with c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page
protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls
pmdp_set_access_flags(). That helper enforces a pmd aligned @address
argument via VM_BUG_ON() assertion.

Update the implementation to take a 'struct vm_fault' argument directly
and apply the address alignment fixup internally to fix crash signatures
like:

kernel BUG at arch/x86/mm/pgtable.c:515!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 51 PID: 43713 Comm: java Tainted: G OE 4.19.35 #1
[..]
RIP: 0010:pmdp_set_access_flags+0x48/0x50
[..]
Call Trace:
vmf_insert_pfn_pmd+0x198/0x350
dax_iomap_fault+0xe82/0x1190
ext4_dax_huge_fault+0x103/0x1f0
? __switch_to_asm+0x40/0x70
__handle_mm_fault+0x3f6/0x1370
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
handle_mm_fault+0xda/0x200
__do_page_fault+0x249/0x4f0
do_page_fault+0x32/0x110
? page_fault+0x8/0x30
page_fault+0x1e/0x30

Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()")
Signed-off-by: Dan Williams
Reported-by: Piotr Balcer
Tested-by: Yan Ma
Tested-by: Pankaj Gupta
Reviewed-by: Matthew Wilcox
Reviewed-by: Jan Kara
Reviewed-by: Aneesh Kumar K.V
Cc: Chandan Rajendra
Cc: Souptick Joarder
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2019-05-15 00:47:44 +0800

14 Mar, 2019

1 commit

11cf9d863 fs/dax: Deposit pagetable even when installing zero page ... Browse Code »

Architectures like ppc64 use the deposited page table to store hardware
page table slot information. Make sure we deposit a page table when
using zero page at the pmd level for hash.

Without this we hit

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000082a74
Oops: Kernel access of bad area, sig: 11 [#1]
....

NIP [c000000000082a74] __hash_page_thp+0x224/0x5b0
LR [c0000000000829a4] __hash_page_thp+0x154/0x5b0
Call Trace:
hash_page_mm+0x43c/0x740
do_hash_page+0x2c/0x3c
copy_from_iter_flushcache+0xa4/0x4a0
pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
dax_copy_from_iter+0x40/0x70
dax_iomap_actor+0x134/0x360
iomap_apply+0xfc/0x1b0
dax_iomap_rw+0xac/0x130
ext4_file_write_iter+0x254/0x460 [ext4]
__vfs_write+0x120/0x1e0
vfs_write+0xd8/0x220
SyS_write+0x6c/0x110
system_call+0x3c/0x130

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Cc:
Reviewed-by: Jan Kara
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Dan Williams

Aneesh Kumar K.V
2019-03-14 04:58:46 +0800

02 Mar, 2019

1 commit

e4b3448bc dax: Flush partial PMDs correctly ... Browse Code »

The radix tree would rewind the index in an iterator to the lowest index
of a multi-slot entry. The XArray iterators instead leave the index
unchanged, but I overlooked that when converting DAX from the radix tree
to the XArray. Adjust the index that we use for flushing to the start
of the PMD range.

Fixes: c1901cd33cf4 ("page cache: Convert find_get_entries_tag to XArray")
Cc:
Reported-by: Piotr Balcer
Tested-by: Dan Williams
Reviewed-by: Jan Kara
Signed-off-by: Matthew Wilcox
Signed-off-by: Dan Williams

Matthew Wilcox
2019-03-02 09:24:48 +0800

13 Feb, 2019

2 commits

0cefc36b3 fs/dax: NIT fix comment regarding start/end vs range ... Browse Code »

Fixes: ac46d4f3c432 ("mm/mmu_notifier: use structure for invalidate_range_start/end calls v2")
Signed-off-by: Ira Weiny
Signed-off-by: Dan Williams

Ira Weiny
2019-02-13 12:24:15 +0800
c9aed74e6 fs/dax: Convert to use vmf_error() ... Browse Code »

This code is converted to use vmf_error().

Signed-off-by: Souptick Joarder
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Souptick Joarder
2019-02-13 12:22:45 +0800

01 Jan, 2019

1 commit

2a1a2c1a7 Merge tag 'dax-fix-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull dax fix from Dan Williams:
"Clean up unnecessary usage of prepare_to_wait_exclusive().

While I feel a bit silly sending a single-commit pull-request there is
nothing else queued up for dax this cycle. This change has shipped in
-next for multiple releases"

* tag 'dax-fix-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Use non-exclusive wait in wait_entry_unlocked()

Linus Torvalds
2019-01-01 01:46:39 +0800

29 Dec, 2018

1 commit

ac46d4f3c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 ... Browse Code »

To avoid having to change many call sites everytime we want to add a
parameter use a structure to group all parameters for the mmu_notifier
invalidate_range_start/end cakks. No functional changes with this patch.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20181205053628.3210-3-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Acked-by: Christian König
Acked-by: Jan Kara
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc: Dan Williams
Cc: Paolo Bonzini
Cc: Radim Krcmar
Cc: Michal Hocko
Cc: Felix Kuehling
Cc: Ralph Campbell
Cc: John Hubbard
From: Jérôme Glisse
Subject: mm/mmu_notifier: use structure for invalidate_range_start/end calls v3

fix build warning in migrate.c when CONFIG_MMU_NOTIFIER=n

Link: http://lkml.kernel.org/r/20181213171330.8489-3-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jérôme Glisse
2018-12-29 04:11:50 +0800

22 Dec, 2018

1 commit

d8a706414 dax: Use non-exclusive wait in wait_entry_unlocked() ... Browse Code »

get_unlocked_entry() uses an exclusive wait because it is guaranteed to
eventually obtain the lock and follow on with an unlock+wakeup cycle.
The wait_entry_unlocked() path does not have the same guarantee. Rather
than open-code an extra wakeup, just switch to a non-exclusive wait.

Cc: Jan Kara
Cc: Matthew Wilcox
Reported-by: Linus Torvalds
Signed-off-by: Dan Williams

Dan Williams
2018-12-22 03:35:53 +0800

05 Dec, 2018

1 commit

27359fd6e dax: Fix unlock mismatch with updated API ... Browse Code »

Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.

In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.

Change the api to return the unlock cookie from dax_lock_page() and pass
it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
assuming that the page was PMD-aligned if the entry was a PMD entry with
signatures like:

WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
RIP: 0010:dax_insert_entry+0x2b2/0x2d0
[..]
Call Trace:
dax_iomap_pte_fault.isra.41+0x791/0xde0
ext4_dax_huge_fault+0x16f/0x1f0
? up_read+0x1c/0xa0
__do_fault+0x1f/0x160
__handle_mm_fault+0x1033/0x1490
handle_mm_fault+0x18b/0x3d0

Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Reported-by: Dan Williams
Signed-off-by: Matthew Wilcox
Tested-by: Dan Williams
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Matthew Wilcox
2018-12-05 13:32:00 +0800

29 Nov, 2018

2 commits

55e56f06e dax: Don't access a freed inode ... Browse Code »

After we drop the i_pages lock, the inode can be freed at any time.
The get_unlocked_entry() code has no choice but to reacquire the lock,
so it can't be used here. Create a new wait_entry_unlocked() which takes
care not to acquire the lock or dereference the address_space in any way.

Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc:
Signed-off-by: Matthew Wilcox
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Matthew Wilcox
2018-11-29 03:08:42 +0800
c93db7bb6 dax: Check page->mapping isn't NULL ... Browse Code »

If we race with inode destroy, it's possible for page->mapping to be
NULL before we even enter this routine, as well as after having slept
waiting for the dax entry to become unlocked.

Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc:
Reported-by: Jan Kara
Signed-off-by: Matthew Wilcox
Reviewed-by: Johannes Thumshirn
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Matthew Wilcox
2018-11-29 03:08:08 +0800

19 Nov, 2018

1 commit

25bbe21bf dax: Avoid losing wakeup in dax_lock_mapping_entry ... Browse Code »

After calling get_unlocked_entry(), you have to call
put_unlocked_entry() to avoid subsequent waiters losing wakeups.

Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-19 22:40:58 +0800

18 Nov, 2018

2 commits

0e40de033 dax: Fix huge page faults ... Browse Code »

Using xas_load() with a PMD-sized xa_state would work if either a
PMD-sized entry was present or a PTE sized entry was present in the
first 64 entries (of the 512 PTEs in a PMD on x86). If there was no
PTE in the first 64 entries, grab_mapping_entry() would believe there
were no entries present, allocate a PMD-sized entry and overwrite the
PTE in the page cache.

Use xas_find_conflict() instead which turns out to simplify
both get_unlocked_entry() and grab_mapping_entry(). Also remove a
WARN_ON_ONCE from grab_mapping_entry() as it will have already triggered
in get_unlocked_entry().

Fixes: cfc93c6c6c96 ("dax: Convert dax_insert_pfn_mkwrite to XArray")
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-18 01:07:53 +0800
fda490d39 dax: Fix dax_unlock_mapping_entry for PMD pages ... Browse Code »

Device DAX PMD pages do not set the PageHead bit for compound pages.
Fix for now by retrieving the PMD bit from the entry, but eventually we
will be passed the page size by the caller.

Reported-by: Dan Williams
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-18 01:07:52 +0800

17 Nov, 2018

1 commit

c5bbd4515 dax: Reinstate RCU protection of inode ... Browse Code »

For the device-dax case, it is possible that the inode can go away
underneath us. The rcu_read_lock() was there to prevent it from
being freed, and not (as I thought) to protect the tree. Bring back
the rcu_read_lock() protection. Also add a little kernel-doc; while
this function is not exported to modules, it is used from outside dax.c

Reported-by: Dan Williams
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-17 05:38:50 +0800