Eric Lee / smarc-fsl-linux-kernel

30 Dec, 2020

1 commit

224adad2c device-dax/core: Fix memory leak when rmmod dax.ko ... Browse Code »

commit 1aa574312518ef1d60d2dc62d58f7021db3b163a upstream.

When I repeatedly modprobe and rmmod dax.ko, kmemleak report a
memory leak as follows:

unreferenced object 0xffff9a5588c05088 (size 8):
comm "modprobe", pid 261, jiffies 4294693644 (age 42.063s)
...
backtrace:
[] kstrdup+0x35/0x70
[] kstrdup_const+0x3d/0x50
[] kvasprintf_const+0xbc/0xf0
[] kobject_set_name_vargs+0x3b/0xd0
[] kobject_set_name+0x62/0x90
[] bus_register+0x7f/0x2b0
[] 0xffffffffc02840f7
[] 0xffffffffc02840b4
[] do_one_initcall+0x58/0x240
[] do_init_module+0x56/0x1e2
[] load_module+0x2517/0x2840
[] __do_sys_finit_module+0x9c/0xe0
[] do_syscall_64+0x33/0x40
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9

When rmmod dax is executed, dax_bus_exit() is missing. This patch
can fix this bug.

Fixes: 9567da0b408a ("device-dax: Introduce bus + driver model")
Cc:
Reported-by: Hulk Robot
Signed-off-by: Wang Hai
Link: https://lore.kernel.org/r/20201201135929.66530-1-wanghai38@huawei.com
Signed-off-by: Dan Williams
Signed-off-by: Greg Kroah-Hartman

Wang Hai
2020-12-30 18:54:26 +0800

20 Oct, 2020

1 commit

694565356 Merge tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse updates from Miklos Szeredi:

- Support directly accessing host page cache from virtiofs. This can
improve I/O performance for various workloads, as well as reducing
the memory requirement by eliminating double caching. Thanks to Vivek
Goyal for doing most of the work on this.

- Allow automatic submounting inside virtiofs. This allows unique
st_dev/ st_ino values to be assigned inside the guest to files
residing on different filesystems on the host. Thanks to Max Reitz
for the patches.

- Fix an old use after free bug found by Pradeep P V K.

* tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
virtiofs: calculate number of scatter-gather elements accurately
fuse: connection remove fix
fuse: implement crossmounts
fuse: Allow fuse_fill_super_common() for submounts
fuse: split fuse_mount off of fuse_conn
fuse: drop fuse_conn parameter where possible
fuse: store fuse_conn in fuse_req
fuse: add submount support to
fuse: fix page dereference after free
virtiofs: add logic to free up a memory range
virtiofs: maintain a list of busy elements
virtiofs: serialize truncate/punch_hole and dax fault path
virtiofs: define dax address space operations
virtiofs: add DAX mmap support
virtiofs: implement dax read/write operations
virtiofs: introduce setupmapping/removemapping commands
virtiofs: implement FUSE_INIT map_alignment field
virtiofs: keep a list of free dax memory ranges
virtiofs: add a mount option to enable dax
virtiofs: set up virtio_fs dax_device
...

Linus Torvalds
2020-10-20 05:28:30 +0800

20 Sep, 2020

2 commits

d4c5da504 dax: Fix stack overflow when mounting fsdax pmem device ... Browse Code »

When mounting fsdax pmem device, commit 6180bb446ab6 ("dax: fix
detection of dax support for non-persistent memory block devices")
introduces the stack overflow [1][2]. Here is the call path for
mounting ext4 file system:
ext4_fill_super
bdev_dax_supported
__bdev_dax_supported
dax_supported
generic_fsdax_supported
__generic_fsdax_supported
bdev_dax_supported

The call path leads to the infinite calling loop, so we cannot
call bdev_dax_supported() in __generic_fsdax_supported(). The sanity
checking of the variable 'dax_dev' is moved prior to the two
bdev_dax_pgoff() checks [3][4].

[1] https://lore.kernel.org/linux-nvdimm/1420999447.1004543.1600055488770.JavaMail.zimbra@redhat.com/
[2] https://lore.kernel.org/linux-nvdimm/alpine.LRH.2.02.2009141131220.30651@file01.intranet.prod.int.rdu2.redhat.com/
[3] https://lore.kernel.org/linux-nvdimm/CA+RJvhxBHriCuJhm-D8NvJRe3h2MLM+ZMFgjeJjrRPerMRLvdg@mail.gmail.com/
[4] https://lore.kernel.org/linux-nvdimm/20200903160608.GU878166@iweiny-DESK2.sc.intel.com/

Fixes: 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices")
Reported-by: Yi Zhang
Reported-by: Mikulas Patocka
Signed-off-by: Adrian Huang
Reviewed-by: Jan Kara
Tested-by: Ritesh Harjani
Cc: Coly Li
Cc: Ira Weiny
Cc: John Pittman
Link: https://lore.kernel.org/r/20200917111549.6367-1-adrianhuang0701@gmail.com
Signed-off-by: Dan Williams

Adrian Huang
2020-09-20 23:57:36 +0800
e2ec51282 dm: Call proper helper to determine dax support ... Browse Code »

DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:

dm-3: error: dax access failed (-95)

when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.

Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
Cc:
Tested-by: Adrian Huang
Signed-off-by: Jan Kara
Acked-by: Mike Snitzer
Reported-by: kernel test robot
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams

Jan Kara
2020-09-20 23:55:09 +0800

10 Sep, 2020

1 commit

1a9d5d405 dax: Modify bdev_dax_pgoff() to handle NULL bdev ... Browse Code »

virtiofs does not have a block device but it has dax device.
Modify bdev_dax_pgoff() to be able to handle that.

If there is no bdev, that means dax offset is 0. (It can't be a partition
block device starting at an offset in dax device).

This is little hackish. There have been discussions about getting rid
of dax not supporting partitions.

https://lore.kernel.org/linux-fsdevel/20200107125159.GA15745@infradead.org/

IMHO, this path can easily break exisitng users. For example
ioctl(BLKPG_ADD_PARTITION) will start breaking on block devices
supporting DAX. Also, I personally find it very useful to be able to
partition dax devices and still be able to use DAX.

Alternatively, I tried to store offset into dax device information in iomap
interface, but that got NACKed.

https://lore.kernel.org/linux-fsdevel/20200217133117.GB20444@infradead.org/

I can't think of a good path to solve this issue properly. So to make
progress, it seems this patch is least bad option for now and I hope
we can take it.

Signed-off-by: Stefan Hajnoczi
Signed-off-by: Vivek Goyal
Reviewed-by: Jan Kara
Cc: Christoph Hellwig
Cc: Dan Williams
Cc: Jan Kara
Cc: Vishal L Verma
Cc: "Weiny, Ira"
Cc: linux-nvdimm@lists.01.org
Signed-off-by: Miklos Szeredi

Vivek Goyal
2020-09-10 17:39:22 +0800

04 Sep, 2020

1 commit

6180bb446 dax: fix detection of dax support for non-persistent memory block devices ... Browse Code »

When calling __generic_fsdax_supported(), a dax-unsupported device may
not have dax_dev as NULL, e.g. the dax related code block is not enabled
by Kconfig.

Therefore in __generic_fsdax_supported(), to check whether a device
supports DAX or not, the following order of operations should be
performed:
- If dax_dev pointer is NULL, it means the device driver explicitly
announce it doesn't support DAX. Then it is OK to directly return
false from __generic_fsdax_supported().
- If dax_dev pointer is NOT NULL, it might be because the driver doesn't
support DAX and not explicitly initialize related data structure. Then
bdev_dax_supported() should be called for further check.

If device driver desn't explicitly set its dax_dev pointer to NULL,
this is not a bug. Calling bdev_dax_supported() makes sure they can be
recognized as dax-unsupported eventually.

Fixes: c2affe920b0e ("dax: do not print error message for non-persistent memory block device")
Cc: Jan Kara
Cc: Vishal Verma
Reviewed-and-tested-by: Adrian Huang
Reviewed-by: Ira Weiny
Reviewed-by: Mike Snitzer
Reviewed-by: Pankaj Gupta
Signed-off-by: Coly Li
Signed-off-by: Vishal Verma
Link: https://lore.kernel.org/r/20200903161625.19524-1-colyli@suse.de

Coly Li
2020-09-04 02:28:03 +0800

21 Aug, 2020

1 commit

c2affe920 dax: do not print error message for non-persistent memory block device ... Browse Code »

Commit 231609785cbf ("dax: print error message by pr_info()
in __generic_fsdax_supported()") happens to print the following
error message during booting when the non-persistent memory block
devices are configured by device mapper. Those error messages are
caused by the variable 'dax_dev' is NULL. Users might be confused
with those error messages since they do not use the persistent
memory device. Moreover, users might scare about "what's wrong
with my disks" because they see the 'error' and 'failed' keywords.

# dmesg | grep fail
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.1T 0 disk
├─sda1 8:1 0 156M 0 part
├─sda2 8:2 0 40G 0 part
└─sda3 8:3 0 1.1T 0 part
sdb 8:16 0 1.1T 0 disk
├─sdb1 8:17 0 600M 0 part
├─sdb2 8:18 0 1G 0 part
└─sdb3 8:19 0 1.1T 0 part
├─rhel00-swap 254:3 0 4G 0 lvm
├─rhel00-home 254:4 0 1T 0 lvm
└─rhel00-root 254:5 0 50G 0 lvm
sdc 8:32 0 1.1T 0 disk
sdd 8:48 0 1.1T 0 disk
sde 8:64 0 1.1T 0 disk
sdf 8:80 0 1.1T 0 disk
sdg 8:96 0 1.1T 0 disk
sdh 8:112 0 3.3T 0 disk
├─sdh1 8:113 0 500M 0 part /boot/efi
├─sdh2 8:114 0 40G 0 part /
├─sdh3 8:115 0 2.9T 0 part /home
└─sdh4 8:116 0 314.6G 0 part [SWAP]
sdi 8:128 0 1.1T 0 disk
sdj 8:144 0 3.3T 0 disk
├─sdj1 8:145 0 512M 0 part
└─sdj2 8:146 0 3.3T 0 part
sdk 8:160 0 119.2G 0 disk
├─sdk1 8:161 0 200M 0 part
├─sdk2 8:162 0 1G 0 part
└─sdk3 8:163 0 118G 0 part
├─rhel-swap 254:0 0 4G 0 lvm
├─rhel-home 254:1 0 64G 0 lvm
└─rhel-root 254:2 0 50G 0 lvm
sdl 8:176 0 119.2G 0 disk

The call path is shown as follows:
dm_table_determine_type
dm_table_supports_dax
device_supports_dax
generic_fsdax_supported
__generic_fsdax_supported

With the disk configuration listing from the command 'lsblk',
the member 'dev->dax_dev' of the block devices 'sdb3' and 'sdk3'
(configured by device mapper) is NULL in function
generic_fsdax_supported() because the member is configured in
function open_table_device().

To prevent the confusing error messages in this scenario (this is
normal behavior), just print those error messages by pr_debug()
by checking if dax_dev is NULL and the block device does not support
DAX.

Link: https://lore.kernel.org/r/20200819154236.24191-1-adrianhuang0701@gmail.com
Fixes: 231609785cbf ("dax: print error message by pr_info() in __generic_fsdax_supported()")
Cc: Coly Li
Cc: Dan Williams
Cc: Alasdair Kergon
Cc: Mike Snitzer
Acked-by: Coly Li
Signed-off-by: Adrian Huang
Signed-off-by: Vishal Verma

Adrian Huang
2020-08-21 01:43:18 +0800

12 Aug, 2020

1 commit

4bf5e3611 Merge tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updayes from Vishal Verma:
"You'd normally receive this pull request from Dan Williams, but he's
busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm
this cycle.

This adds a new feature in libnvdimm - 'Runtime Firmware Activation',
and a few small cleanups and fixes in libnvdimm and DAX. I'd
originally intended to make separate topic-based pull requests - one
for libnvdimm, and one for DAX, but some of the DAX material fell out
since it wasn't quite ready.

Summary:

- add 'Runtime Firmware Activation' support for NVDIMMs that
advertise the relevant capability

- misc libnvdimm and DAX cleanups"

* tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr
libnvdimm/security: the 'security' attr never show 'overwrite' state
libnvdimm/security: fix a typo
ACPI: NFIT: Fix ARS zero-sized allocation
dax: Fix incorrect argument passed to xas_set_err()
ACPI: NFIT: Add runtime firmware activate support
PM, libnvdimm: Add runtime firmware activation support
libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO()
drivers/dax: Expand lock scope to cover the use of addresses
fs/dax: Remove unused size parameter
dax: print error message by pr_info() in __generic_fsdax_supported()
driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW}
tools/testing/nvdimm: Emulate firmware activation commands
tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation
tools/testing/nvdimm: Add command debug messages
tools/testing/nvdimm: Cleanup dimm index passing
ACPI: NFIT: Define runtime firmware activation commands
ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor
libnvdimm: Validate command family indices

Linus Torvalds
2020-08-12 01:59:19 +0800

29 Jul, 2020

2 commits

eedfd73d4 drivers/dax: Expand lock scope to cover the use of addresses ... Browse Code »

The addition of PKS protection to dax read lock/unlock will require that
the address returned by dax_direct_access() be protected by this lock.

Correct the locking by ensuring that the use of kaddr and end_kaddr
are covered by the dax read lock/unlock.

Link: https://lore.kernel.org/r/20200717072056.73134-12-ira.weiny@intel.com
Reviewed-by: Dan Williams
Signed-off-by: Ira Weiny
Signed-off-by: Vishal Verma

Ira Weiny
2020-07-29 01:50:08 +0800
231609785 dax: print error message by pr_info() in __generic_fsdax_supported() ... Browse Code »

In struct dax_operations, the callback routine dax_supported() returns
a bool type result. For false return value, the caller has no idea
whether the device does not support dax at all, or it is just some mis-
configuration issue.

An example is formatting an Ext4 file system on pmem device on top of
a NVDIMM namespace by,
# mkfs.ext4 /dev/pmem0
If the fs block size does not match kernel space memory page size (which
is possible on non-x86 platform), mount this Ext4 file system will fail,
# mount -o dax /dev/pmem0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/pmem0,
missing codepage or helper program, or other error.
And from the dmesg output there is only the following information,
[ 307.853148] EXT4-fs (pmem0): DAX unsupported by block device.

The above information is quite confusing. Because definitely the pmem0
device supports dax operation, and the super block is consistent as how
it was created by mkfs.ext4.

Indeed the failure is from __generic_fsdax_supported() by the following
code piece,
if (blocksize != PAGE_SIZE) {
pr_debug("%s: error: unsupported blocksize for dax\n",
bdevname(bdev, buf));
return false;
}
It is because the Ext4 block size is 4KB and kernel page size is 8KB or
16KB.

It is not simple to make dax_supported() from struct dax_operations
or __generic_fsdax_supported() to return exact failure type right now.
So the simplest fix is to use pr_info() to print all the error messages
inside __generic_fsdax_supported(). Then users may find informative clue
from the kernel message at least.

Message printed by pr_debug() is very easy to be ignored by users. This
patch prints error message by pr_info() in __generic_fsdax_supported(),
when then mount fails, following lines can be found from dmesg output,
[ 2705.500885] pmem0: error: unsupported blocksize for dax
[ 2705.500888] EXT4-fs (pmem0): DAX unsupported by block device.
Now the users may have idea the mount failure is from pmem driver for
unsupported block size.

Link: https://lore.kernel.org/r/20200725162450.95999-1-colyli@suse.de
Cc: Dan Williams
Cc: Anthony Iliopoulos
Reported-by: Michal Suchanek
Suggested-by: Jan Kara
Reviewed-by: Jan Kara
Reviewed-by: Ira Weiny
Reviewed-by: Pankaj Gupta
Signed-off-by: Coly Li
Signed-off-by: Vishal Verma

Coly Li
2020-07-29 01:49:27 +0800

01 Jul, 2020

1 commit

e556f6ba1 block: remove the bd_queue field from struct block_device ... Browse Code »

Just use bd_disk->queue instead.

Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-01 22:08:20 +0800

03 Apr, 2020

2 commits

4e4ced937 dax: Move mandatory ->zero_page_range() check in alloc_dax() ... Browse Code »

zero_page_range() dax operation is mandatory for dax devices. Right now
that check happens in dax_zero_page_range() function. Dan thinks that's
too late and its better to do the check earlier in alloc_dax().

I also modified alloc_dax() to return pointer with error code in it in
case of failure. Right now it returns NULL and caller assumes failure
happened due to -ENOMEM. But with this ->zero_page_range() check, I
need to return -EINVAL instead.

Signed-off-by: Vivek Goyal
Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2020-04-03 10:15:03 +0800
f605a263e dax, pmem: Add a dax operation zero_page_range ... Browse Code »

Add a dax operation zero_page_range, to zero a page. This will also clear any
known poison in the page being zeroed.

As of now, zeroing of one page is allowed in a single call. There
are no callers which are trying to zero more than a page in a single call.
Once we grow the callers which zero more than a page in single call, we
can add that support. Primary reason for not doing that yet is that this
will add little complexity in dm implementation where a range might be
spanning multiple underlying targets and one will have to split the range
into multiple sub ranges and call zero_page_range() on individual targets.

Suggested-by: Christoph Hellwig
Signed-off-by: Vivek Goyal
Reviewed-by: Pankaj Gupta
Link: https://lore.kernel.org/r/20200228163456.1587-3-vgoyal@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2020-04-03 10:15:03 +0800

17 Jan, 2020

1 commit

f01b16a85 dax: Get rid of fs_dax_get_by_host() helper ... Browse Code »

Looks like nobody is using fs_dax_get_by_host() except fs_dax_get_by_bdev()
and it can easily use dax_get_by_host() instead.

IIUC, fs_dax_get_by_host() was only introduced so that one could compile
with CONFIG_FS_DAX=n and CONFIG_DAX=m. fs_dax_get_by_bdev() achieves
the same purpose and hence it looks like fs_dax_get_by_host() is not
needed anymore.

Signed-off-by: Vivek Goyal
Reviewed-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20200106181117.GA16248@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2020-01-17 01:52:27 +0800

20 Jul, 2019

1 commit

933a90bf4 Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs mount updates from Al Viro:
"The first part of mount updates.

Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
mnt_init(): call shmem_init() unconditionally
constify ksys_mount() string arguments
don't bother with registering rootfs
init_rootfs(): don't bother with init_ramfs_fs()
vfs: Convert smackfs to use the new mount API
vfs: Convert selinuxfs to use the new mount API
vfs: Convert securityfs to use the new mount API
vfs: Convert apparmorfs to use the new mount API
vfs: Convert openpromfs to use the new mount API
vfs: Convert xenfs to use the new mount API
vfs: Convert gadgetfs to use the new mount API
vfs: Convert oprofilefs to use the new mount API
vfs: Convert ibmasmfs to use the new mount API
vfs: Convert qib_fs/ipathfs to use the new mount API
vfs: Convert efivarfs to use the new mount API
vfs: Convert configfs to use the new mount API
vfs: Convert binfmt_misc to use the new mount API
convenience helper: get_tree_single()
convenience helper get_tree_nodev()
vfs: Kill sget_userns()
...

Linus Torvalds
2019-07-20 01:42:02 +0800

06 Jul, 2019

1 commit

fefc1d97f libnvdimm: add dax_dev sync flag ... Browse Code »

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta
Signed-off-by: Dan Williams

Pankaj Gupta
2019-07-06 06:19:10 +0800

05 Jun, 2019

1 commit

5b497af42 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 64 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Alexios Zavras
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:36:38 +0800

26 May, 2019

2 commits

75d4e06f0 vfs: Convert dax to use the new mount API ... Browse Code »

Convert the dax filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Signed-off-by: David Howells
cc: Dan Williams
cc: Vishal Verma
cc: Keith Busch
cc: Dave Jiang
cc: linux-nvdimm@lists.01.org
Signed-off-by: Al Viro

David Howells
2019-05-26 06:06:12 +0800
1f58bb18f mount_pseudo(): drop 'name' argument, switch to d_make_root() ... Browse Code »

Once upon a time we used to set ->d_name of e.g. pipefs root
so that d_path() on pipes would work. These days it's
completely pointless - dentries of pipes are not even connected
to pipefs root. However, mount_pseudo() had set the root
dentry name (passed as the second argument) and callers
kept inventing names to pass to it. Including those that
didn't *have* any non-root dentries to start with...

All of that had been pointless for about 8 years now; it's
time to get rid of that cargo-culting...

Signed-off-by: Al Viro

Al Viro
2019-05-26 05:59:24 +0800

21 May, 2019

2 commits

1a6e9e76b device-dax: Drop register_filesystem() ... Browse Code »

The device-dax fs is only there to allocate a common inode for each
device-node that refers to the same device by major:minor. It is
otherwise not user mountable and need not be displayed in
/proc/filesystems.

Reported-by: Al Viro
Acked-by: Al Viro
Signed-off-by: Dan Williams
Signed-off-by: Al Viro

Dan Williams
2019-05-21 15:23:41 +0800
7bf7eac8d dax: Arrange for dax_supported check to span multiple devices ... Browse Code »

Pankaj reports that starting with commit ad428cdb525a "dax: Check the
end of the block-device capacity with dax_direct_access()" device-mapper
no longer allows dax operation. This results from the stricter checks in
__bdev_dax_supported() that validate that the start and end of a
block-device map to the same 'pagemap' instance.

Teach the dax-core and device-mapper to validate the 'pagemap' on a
per-target basis. This is accomplished by refactoring the
bdev_dax_supported() internals into generic_fsdax_supported() which
takes a sector range to validate. Consequently generic_fsdax_supported()
is suitable to be used in a device-mapper ->iterate_devices() callback.
A new ->dax_supported() operation is added to allow composite devices to
split and route upper-level bdev_dax_supported() requests.

Fixes: ad428cdb525a ("dax: Check the end of the block-device...")
Cc:
Cc: Ira Weiny
Cc: Dave Jiang
Cc: Keith Busch
Cc: Matthew Wilcox
Cc: Vishal Verma
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Reviewed-by: Jan Kara
Reported-by: Pankaj Gupta
Reviewed-by: Pankaj Gupta
Tested-by: Pankaj Gupta
Tested-by: Vaibhav Jain
Reviewed-by: Mike Snitzer
Signed-off-by: Dan Williams

Dan Williams
2019-05-21 06:02:08 +0800

02 May, 2019

1 commit

53e228299 dax: make use of ->free_inode() ... Browse Code »

we might want to drop ->destroy_inode() there - it's used only for
WARN_ON() now, and AFAICS that could be moved to ->evict_inode()
if we had one...

Reviewed-by: Jan Kara
Acked-by: Dan Williams
Signed-off-by: Al Viro

Al Viro
2019-05-02 10:43:26 +0800

17 Mar, 2019

1 commit

f67e3fb48 Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull device-dax updates from Dan Williams:
"New device-dax infrastructure to allow persistent memory and other
"reserved" / performance differentiated memories, to be assigned to
the core-mm as "System RAM".

Some users want to use persistent memory as additional volatile
memory. They are willing to cope with potential performance
differences, for example between DRAM and 3D Xpoint, and want to use
typical Linux memory management apis rather than a userspace memory
allocator layered over an mmap() of a dax file. The administration
model is to decide how much Persistent Memory (pmem) to use as System
RAM, create a device-dax-mode namespace of that size, and then assign
it to the core-mm. The rationale for device-dax is that it is a
generic memory-mapping driver that can be layered over any "special
purpose" memory, not just pmem. On subsequent boots udev rules can be
used to restore the memory assignment.

One implication of using pmem as RAM is that mlock() no longer keeps
data off persistent media. For this reason it is recommended to enable
NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
at rest. We considered making this recommendation an actively enforced
requirement, but in the end decided to leave it as a distribution /
administrator policy to allow for emulation and test environments that
lack security capable NVDIMMs.

Summary:

- Replace the /sys/class/dax device model with /sys/bus/dax, and
include a compat driver so distributions can opt-in to the new ABI.

- Allow for an alternative driver for the device-dax address-range

- Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.

- Arrange for the device-dax target-node to be onlined so that the
newly added memory range can be uniquely referenced by numa apis"

NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.

And apparently the PMEM modules can cause that a lot more than regular
RAM. The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.

Quoting Dan from another email:
"The exposure can be reduced in the volatile-RAM case by scanning for
and clearing errors before it is onlined as RAM. The userspace tooling
for that can be in place before v5.1-final. There's also runtime
notifications of errors via acpi_nfit_uc_error_notify() from
background scrubbers on the DIMM devices. With that mechanism the
kernel could proactively clear newly discovered poison in the volatile
case, but that would be additional development more suitable for v5.2.

I understand the concern, and the need to highlight this issue by
tapping the brakes on feature development, but I don't see PMEM as RAM
making the situation worse when the exposure is also there via DAX in
the PMEM case. Volatile-RAM is arguably a safer use case since it's
possible to repair pages where the persistent case needs active
application coordination"

* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: "Hotplug" persistent memory for use like normal RAM
mm/resource: Let walk_system_ram_range() search child resources
mm/memory-hotplug: Allow memory resources to be children
mm/resource: Move HMM pr_debug() deeper into resource code
mm/resource: Return real error codes from walk failures
device-dax: Add a 'modalias' attribute to DAX 'bus' devices
device-dax: Add a 'target_node' attribute
device-dax: Auto-bind device after successful new_id
acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
device-dax: Add /sys/class/dax backwards compatibility
device-dax: Add support for a dax override driver
device-dax: Move resource pinning+mapping into the common driver
device-dax: Introduce bus + driver model
device-dax: Start defining a dax bus model
device-dax: Remove multi-resource infrastructure
device-dax: Kill dax_region base
device-dax: Kill dax_region ida

Linus Torvalds
2019-03-17 04:05:32 +0800

21 Feb, 2019

1 commit

ad428cdb5 dax: Check the end of the block-device capacity with dax_direct_access() ... Browse Code »

The checks in __bdev_dax_supported() helped mitigate a potential data
corruption bug in the pmem driver's handling of section alignment
padding. Strengthen the checks, including checking the end of the range,
to validate the dev_pagemap, Xarray entries, and sector-to-pfn
translation established for pmem namespaces.

Acked-by: Jan Kara
Cc: "Darrick J. Wong"
Signed-off-by: Dan Williams

Dan Williams
2019-02-21 13:12:50 +0800

07 Jan, 2019

2 commits

9567da0b4 device-dax: Introduce bus + driver model ... Browse Code »

In support of multiple device-dax instances per device-dax-region and
allowing the 'kmem' driver to attach to dax-instances instead of the
current device-node access, convert the dax sub-system from a class to a
bus. Recall that the kmem driver takes reserved / special purpose
memories and assigns them to be managed by the core-mm.

Aside from the fact the device-dax instances are registered and probed
on a bus, two other lifetime-management changes are made:

1/ Delay attaching a cdev until driver probe time

2/ A new run_dax() helper is introduced to allow restoring dax-operation
after a kill_dax() event. So, at driver ->probe() time we run_dax()
and at ->remove() time we kill_dax() and invalidate all mappings.

Signed-off-by: Dan Williams

Dan Williams
2019-01-07 13:24:46 +0800
51cf784c4 device-dax: Start defining a dax bus model ... Browse Code »

Towards eliminating the dax_class, move the dax-device-attribute
enabling to a new bus.c file in the core. The amount of code
thrash of sub-sequent patches is reduced as no logic changes are made,
just pure code movement.

A temporary export of unregister_dex_dax() and dax_attribute_groups is
needed to preserve compilation, but those symbols become static again in
a follow-on patch.

Signed-off-by: Dan Williams

Dan Williams
2019-01-07 13:24:46 +0800

26 Aug, 2018

1 commit

828bf6e90 Merge tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dave Jiang:
"Collection of misc libnvdimm patches for 4.19 submission:

- Adding support to read locked nvdimm capacity.

- Change test code to make DSM failure code injection an override.

- Add support for calculate maximum contiguous area for namespace.

- Add support for queueing a short ARS when there is on going ARS for
nvdimm.

- Allow NULL to be passed in to ->direct_access() for kaddr and pfn
params.

- Improve smart injection support for nvdimm emulation testing.

- Fix test code that supports for emulating controller temperature.

- Fix hang on error before devm_memremap_pages()

- Fix a bug that causes user memory corruption when data returned to
user for ars_status.

- Maintainer updates for Ross Zwisler emails and adding Jan Kara to
fsdax"

* tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm: fix ars_status output length calculation
device-dax: avoid hang on error before devm_memremap_pages()
tools/testing/nvdimm: improve emulation of smart injection
filesystem-dax: Do not request kaddr and pfn when not required
md/dm-writecache: Don't request pointer dummy_addr when not required
dax/super: Do not request a pointer kaddr when not required
tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access()
s390, dcssblk: kaddr and pfn can be NULL to ->direct_access()
libnvdimm, pmem: kaddr and pfn can be NULL to ->direct_access()
acpi/nfit: queue issuing of ars when an uc error notification comes in
libnvdimm: Export max available extent
libnvdimm: Use max contiguous area for namespace size
MAINTAINERS: Add Jan Kara for filesystem DAX
MAINTAINERS: update Ross Zwisler's email address
tools/testing/nvdimm: Fix support for emulating controller temperature
tools/testing/nvdimm: Make DSM failure code injection an override
acpi, nfit: Prefer _DSM over _LSR for namespace label reads
libnvdimm: Introduce locked DIMM capacity support

Linus Torvalds
2018-08-26 09:13:10 +0800

31 Jul, 2018

1 commit

e0b401e3f dax/super: Do not request a pointer kaddr when not required ... Browse Code »

Function __bdev_dax_supported doesn't need to get local pointer kaddr
from direct_access. Using NULL instead of having to pass in a useless
local pointer that caller then just throw away.

Signed-off-by: Huaisheng Ye
Reviewed-by: Ross Zwisler
Signed-off-by: Dave Jiang

Huaisheng Ye
2018-07-31 00:39:28 +0800

29 Jun, 2018

1 commit

15256f6cc dax: check for QUEUE_FLAG_DAX in bdev_dax_supported() ... Browse Code »

Add an explicit check for QUEUE_FLAG_DAX to __bdev_dax_supported(). This
is needed for DM configurations where the first element in the dm-linear or
dm-stripe target supports DAX, but other elements do not. Without this
check __bdev_dax_supported() will pass for such devices, letting a
filesystem on that device mount with the DAX option.

Signed-off-by: Ross Zwisler
Suggested-by: Mike Snitzer
Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Acked-by: Dan Williams
Reviewed-by: Toshi Kani
Signed-off-by: Mike Snitzer

Ross Zwisler
2018-06-29 04:06:08 +0800

09 Jun, 2018

3 commits

7d3bf613e Merge tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"This adds a user for the new 'bytes-remaining' updates to
memcpy_mcsafe() that you already received through Ingo via the
x86-dax- for-linus pull.

Not included here, but still targeting this cycle, is support for
handling memory media errors (poison) consumed via userspace dax
mappings.

Summary:

- DAX broke a fundamental assumption of truncate of file mapped
pages. The truncate path assumed that it is safe to disconnect a
pinned page from a file and let the filesystem reclaim the physical
block. With DAX the page is equivalent to the filesystem block.
Introduce dax_layout_busy_page() to enable filesystems to wait for
pinned DAX pages to be released. Without this wait a filesystem
could allocate blocks under active device-DMA to a new file.

- DAX arranges for the block layer to be bypassed and uses
dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
However, the memcpy_mcsafe() facility is available through the pmem
block driver. In order to safely handle media errors, via the DAX
block-layer bypass, introduce copy_to_iter_mcsafe().

- Fix cache management policy relative to the ACPI NFIT Platform
Capabilities Structure to properly elide cache flushes when they
are not necessary. The table indicates whether CPU caches are
power-fail protected. Clarify that a deep flush is always performed
on REQ_{FUA,PREFLUSH} requests"

* tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
dax: Use dax_write_cache* helpers
libnvdimm, pmem: Do not flush power-fail protected CPU caches
libnvdimm, pmem: Unconditionally deep flush on *sync
libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH
acpi, nfit: Remove ecc_unit_size
dax: dax_insert_mapping_entry always succeeds
libnvdimm, e820: Register all pmem resources
libnvdimm: Debug probe times
linvdimm, pmem: Preserve read-only setting for pmem devices
x86, nfit_test: Add unit test for memcpy_mcsafe()
pmem: Switch to copy_to_iter_mcsafe()
dax: Report bytes remaining in dax_iomap_actor()
dax: Introduce a ->copy_to_iter dax operation
uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
xfs, dax: introduce xfs_break_dax_layouts()
xfs: prepare xfs_break_layouts() for another layout type
xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
mm, fs, dax: handle layout changes to pinned dax mappings
mm: fix __gup_device_huge vs unmap
mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
...

Linus Torvalds
2018-06-09 08:21:52 +0800
930218aff Merge branch 'for-4.18/mcsafe' into libnvdimm-for-next Browse Code »

Dan Williams
2018-06-09 06:16:44 +0800
b56845794 Merge branch 'for-4.18/dax' into libnvdimm-for-next Browse Code »

Dan Williams
2018-06-09 06:16:40 +0800

07 Jun, 2018

1 commit

808c340be dax: Use dax_write_cache* helpers ... Browse Code »

Use dax_write_cache() and dax_write_cache_enabled() instead of open coding
the bit operations.

Signed-off-by: Ross Zwisler
Signed-off-by: Dan Williams

Ross Zwisler
2018-06-07 02:02:34 +0800

31 May, 2018

2 commits

80660f202 dax: change bdev_dax_supported() to support boolean returns ... Browse Code »

The function return values are confusing with the way the function is
named. We expect a true or false return value but it actually returns
0/-errno. This makes the code very confusing. Changing the return values
to return a bool where if DAX is supported then return true and no DAX
support returns false.

Signed-off-by: Dave Jiang
Signed-off-by: Ross Zwisler
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Dave Jiang
2018-05-31 23:58:34 +0800
ba23cba9b fs: allow per-device dax status checking for filesystems ... Browse Code »

Change bdev_dax_supported so it takes a bdev parameter. This enables
multi-device filesystems like xfs to check that a dax device can work for
the particular filesystem. Once that's in place, actually fix all the
parts of XFS where we need to be able to distinguish between datadev and
rtdev.

This patch fixes the problem where we screw up the dax support checking
in xfs if the datadev and rtdev have different dax capabilities.

Signed-off-by: Darrick J. Wong
[rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases]
Signed-off-by: Ross Zwisler
Reviewed-by: Eric Sandeen

Darrick J. Wong
2018-05-31 23:58:33 +0800

23 May, 2018

1 commit

b3a9a0c36 dax: Introduce a ->copy_to_iter dax operation ... Browse Code »

Similar to the ->copy_from_iter() operation, a platform may want to
deploy an architecture or device specific routine for handling reads
from a dax_device like /dev/pmemX. On x86 this routine will point to a
machine check safe version of copy_to_iter(). For now, add the plumbing
to device-mapper and the dax core.

Cc: Ross Zwisler
Cc: Mike Snitzer
Cc: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2018-05-23 14:18:31 +0800

22 May, 2018

1 commit

e76384884 mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS ... Browse Code »

In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
be able to rely on the fact that they will get wakeups on dev_pagemap
page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
generic_dax_page_free() as common indicator / infrastructure for dax
filesytems to require. With this change there are no users of the
MEMORY_DEVICE_HOST designation, so remove it.

The HMM sub-system extended dev_pagemap to arrange a callback when a
dev_pagemap managed page is freed. Since a dev_pagemap page is free /
idle when its reference count is 1 it requires an additional branch to
check the page-type at put_page() time. Given put_page() is a hot-path
we do not want to incur that check if HMM is not in use, so a static
branch is used to avoid that overhead when not necessary.

Now, the FS_DAX implementation wants to reuse this mechanism for
receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
static-key into a generic mechanism that either HMM or FS_DAX code paths
can enable.

For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
However, we still need to support FS_DAX in the FS_DAX_LIMITED case
implemented by the s390/dcssblk driver.

Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Michal Hocko
Reported-by: kbuild test robot
Reported-by: Thomas Meyer
Reported-by: Dave Jiang
Cc: "Jérôme Glisse"
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2018-05-22 21:59:39 +0800

31 Mar, 2018

1 commit

3fe0791c2 dax: store pfns in the radix ... Browse Code »

In preparation for examining the busy state of dax pages in the truncate
path, switch from sectors to pfns in the radix.

Cc: Jeff Moyer
Cc: Christoph Hellwig
Cc: Matthew Wilcox
Cc: Ross Zwisler
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2018-03-31 02:34:54 +0800

27 Feb, 2018

1 commit

9d4949b49 dax: ->direct_access does not sleep anymore ... Browse Code »

In Patch:
[7a862fb] brd: remove dax support

Dan Williams has removed the only might_sleep
implementation of ->direct_access.
So we no longer need to check for it.

CC: Dan Williams
Signed-off-by: Boaz Harrosh
Signed-off-by: Dan Williams

Boaz Harrosh
2018-02-27 04:32:29 +0800

20 Jan, 2018

1 commit

569d0365f dax: require 'struct page' by default for filesystem dax ... Browse Code »

If a dax buffer from a device that does not map pages is passed to
read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If
gdb attempts to examine the contents of a dax buffer from a device that
does not map pages it triggers SIGBUS. If fork(2) is called on a process
with a dax mapping from a device that does not map pages it triggers
SIGBUS. 'struct page' is required otherwise several kernel code paths
break in surprising ways. Disable filesystem-dax on devices that do not
map pages.

In addition to needing pfn_to_page() to be valid we also require devmap
pages. We need this to detect dax pages in the get_user_pages_fast()
path and so that we can stop managing the VM_MIXEDMAP flag. For DAX
drivers that have not supported get_user_pages() to date we allow them
to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration
option which requires ->direct_access() to return pfn_t_special() pfns.
This leaves DAX support in brd disabled and scheduled for removal.

Note that when the initial dax support was being merged a few years back
there was concern that struct page was unsuitable for use with next
generation persistent memory devices. The theoretical concern was that
struct page access, being such a hotly used data structure in the
kernel, would lead to media wear out. While that was a reasonable
conservative starting position it has not held true in practice. We have
long since committed to using devm_memremap_pages() to support higher
order kernel functionality that needs get_user_pages() and
pfn_to_page().

Cc: Jeff Moyer
Cc: Ross Zwisler
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Reviewed-by: Gerald Schaefer
Signed-off-by: Dan Williams

Dan Williams
2018-01-20 08:50:53 +0800