28 Mar, 2020
1 commit
-
Current make_request based drivers use either blk_alloc_queue_node or
blk_alloc_queue to allocate a queue, and then set up the make_request_fn
function pointer and a few parameters using the blk_queue_make_request
helper. Simplify this by passing the make_request pointer to
blk_alloc_queue, and while at it merge the _node variant into the main
helper by always passing a node_id, and remove the superfluous gfp_mask
parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
01 Feb, 2020
1 commit
-
After the removal of the device-public infrastructure there are only 2
->page_free() call backs in the kernel. One of those is a
device-private callback in the nouveau driver, the other is a generic
wakeup needed in the DAX case. In the hopes that all ->page_free()
callbacks can be migrated to common core kernel functionality, move the
device-private specific actions in __put_devmap_managed_page() under the
is_device_private_page() conditional, including the ->page_free()
callback. For the other page types just open-code the generic wakeup.Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
case.Link: http://lkml.kernel.org/r/20200107224558.2362728-4-jhubbard@nvidia.com
Signed-off-by: Dan Williams
Signed-off-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: Jérôme Glisse
Cc: Jan Kara
Cc: Ira Weiny
Cc: Alex Williamson
Cc: Aneesh Kumar K.V
Cc: Björn Töpel
Cc: Daniel Vetter
Cc: Hans Verkuil
Cc: Jason Gunthorpe
Cc: Jason Gunthorpe
Cc: Jens Axboe
Cc: Jonathan Corbet
Cc: Kirill A. Shutemov
Cc: Leon Romanovsky
Cc: Mauro Carvalho Chehab
Cc: Mike Rapoport
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Dec, 2019
2 commits
-
Pull libnvdimm updates from Dan Williams:
"The highlight this cycle is continuing integration fixes for PowerPC
and some resulting optimizations.Summary:
- Updates to better support vmalloc space restrictions on PowerPC
platforms.- Cleanups to move common sysfs attributes to core 'struct
device_type' objects.- Export the 'target_node' attribute (the effective numa node if pmem
is marked online) for regions and namespaces.- Miscellaneous fixups and optimizations"
* tag 'libnvdimm-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
MAINTAINERS: Remove Keith from NVDIMM maintainers
libnvdimm: Export the target_node attribute for regions and namespaces
dax: Add numa_node to the default device-dax attributes
libnvdimm: Simplify root read-only definition for the 'resource' attribute
dax: Simplify root read-only definition for the 'resource' attribute
dax: Create a dax device_type
libnvdimm: Move nvdimm_bus_attribute_group to device_type
libnvdimm: Move nvdimm_attribute_group to device_type
libnvdimm: Move nd_mapping_attribute_group to device_type
libnvdimm: Move nd_region_attribute_group to device_type
libnvdimm: Move nd_numa_attribute_group to device_type
libnvdimm: Move nd_device_attribute_group to device_type
libnvdimm: Move region attribute group definition
libnvdimm: Move attribute groups to device type
libnvdimm: Remove prototypes for nonexistent functions
libnvdimm/btt: fix variable 'rc' set but not used
libnvdimm/pmem: Delete include of nd-core.h
libnvdimm/namespace: Differentiate between probe mapping and runtime mapping
libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe
libnvdimm: Trivial comment fix
... -
Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
"As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need
support for time64_t.In completely unrelated work, I spent time on cleaning up parts of
this file in the past, moving things out into drivers instead.After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the
rest of it and move it all into drivers.This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which
is the scsi ioctl handlers. I have patches for those as well, but they
need more testing or possibly a rewrite"* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
scsi: sd: enable compat ioctls for sed-opal
pktcdvd: add compat_ioctl handler
compat_ioctl: move SG_GET_REQUEST_TABLE handling
compat_ioctl: ppp: move simple commands into ppp_generic.c
compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
compat_ioctl: unify copy-in of ppp filters
tty: handle compat PPP ioctls
compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
compat_ioctl: handle SIOCOUTQNSD
af_unix: add compat_ioctl support
compat_ioctl: reimplement SG_IO handling
compat_ioctl: move WDIOC handling into wdt drivers
fs: compat_ioctl: move FITRIM emulation into file systems
gfs2: add compat_ioctl support
compat_ioctl: remove unused convert_in_user macro
compat_ioctl: remove last RAID handling code
compat_ioctl: remove /dev/raw ioctl translation
compat_ioctl: remove PCI ioctl translation
compat_ioctl: remove joystick ioctl translation
...
20 Nov, 2019
7 commits
-
Aneesh points out that some platforms may have "local" attached
persistent memory and "remote" persistent memory that map to the same
"online" node, or persistent memory devices with different performance
properties. In this case 'numa_node' is identical for the two instances,
but 'target_node' is differentiated so platform firmware can communicate
distinct performance properties per range. Expose 'target_node' by
default to allow for disambiguation of devices that share the same
numa_map_to_online_node() result.Reported-by: "Aneesh Kumar K.V"
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157401274500.43284.2369509941678577768.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams -
Rather than update the permission in ->is_visible() set the permission
directly at declaration time.Cc: Ira Weiny
Cc: Vishal Verma
Signed-off-by: Dan Williams
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157309905534.1582359.13927459228885931097.stgit@dwillia2-desk3.amr.corp.intel.com -
A 'struct device_type' instance can carry default attributes for the
device. Use this facility to remove the export of
nvdimm_bus_attribute_group and put the responsibility on the core rather
than leaf implementations to define this attribute.Cc: Ira Weiny
Cc: Michael Ellerman
Cc: "Oliver O'Halloran"
Cc: Vishal Verma
Cc: Aneesh Kumar K.V
Signed-off-by: Dan Williams
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157309903815.1582359.6418211876315050283.stgit@dwillia2-desk3.amr.corp.intel.com -
A 'struct device_type' instance can carry default attributes for the
device. Use this facility to remove the export of
nvdimm_attribute_group and put the responsibility on the core rather
than leaf implementations to define this attribute.Cc: Ira Weiny
Cc: Michael Ellerman
Cc: "Oliver O'Halloran"
Cc: Vishal Verma
Cc: Aneesh Kumar K.V
Signed-off-by: Dan Williams
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157309903201.1582359.10966209746585062329.stgit@dwillia2-desk3.amr.corp.intel.com -
A 'struct device_type' instance can carry default attributes for the
device. Use this facility to remove the export of
nd_mapping_attribute_group and put the responsibility on the core rather
than leaf implementations to define this attribute.Cc: Ira Weiny
Cc: Michael Ellerman
Cc: "Oliver O'Halloran"
Cc: Vishal Verma
Cc: Aneesh Kumar K.V
Signed-off-by: Dan Williams
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157309902686.1582359.6749533709859492704.stgit@dwillia2-desk3.amr.corp.intel.com -
A 'struct device_type' instance can carry default attributes for the
device. Use this facility to remove the export of
nd_region_attribute_group and put the responsibility on the core rather
than leaf implementations to define this attribute.Cc: Ira Weiny
Cc: Michael Ellerman
Cc: "Oliver O'Halloran"
Cc: Vishal Verma
Cc: Aneesh Kumar K.V
Signed-off-by: Dan Williams
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157309902169.1582359.16828508538444551337.stgit@dwillia2-desk3.amr.corp.intel.com -
A 'struct device_type' instance can carry default attributes for the
device. Use this facility to remove the export of
nd_numa_attribute_group and put the responsibility on the core rather
than leaf implementations to define this attribute.Cc: Ira Weiny
Cc: Michael Ellerman
Cc: "Oliver O'Halloran"
Cc: Vishal Verma
Cc: Aneesh Kumar K.V
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157401269537.43284.14411189404186877352.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams
18 Nov, 2019
6 commits
-
A 'struct device_type' instance can carry default attributes for the
device. Use this facility to remove the export of
nd_device_attribute_group and put the responsibility on the core rather
than leaf implementations to define this attribute.For regions this creates a new nd_region_attribute_groups[] added to the
per-region device-type instances.Cc: Ira Weiny
Cc: Michael Ellerman
Cc: "Oliver O'Halloran"
Cc: Vishal Verma
Cc: Aneesh Kumar K.V
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157309901138.1582359.12909354140826530394.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams -
In preparation for moving region attributes from device attribute groups
to the region device-type, reorder the declaration so that it can be
referenced by the device-type definition without forward declarations.
No functional changes are intended to result from this change.Cc: Ira Weiny
Cc: Vishal Verma
Signed-off-by: Dan Williams
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157309900624.1582359.6929998072035982264.stgit@dwillia2-desk3.amr.corp.intel.com -
Statically initialize the attribute groups for each libnvdimm
device_type. This is a preparation step for removing unnecessary exports
of attributes that can be included in the device_type by default.Also take the opportunity to mark 'struct device_type' instances const.
Cc: Ira Weiny
Cc: Vishal Verma
Signed-off-by: Dan Williams
Reviewed-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/157309900111.1582359.2445687530383470348.stgit@dwillia2-desk3.amr.corp.intel.com -
These functions don't exist, so remove the prototypes for them.
Signed-off-by: Alastair D'Silva
Reviewed-by: Andrew Donnellan
Reviewed-by: Frederic Barrat
Link: https://lore.kernel.org/r/20191025044721.16617-3-alastair@au1.ibm.com
Signed-off-by: Dan Williams -
drivers/nvdimm/btt.c: In function 'btt_read_pg':
drivers/nvdimm/btt.c:1264:8: warning: variable 'rc' set but not used
[-Wunused-but-set-variable]
int rc;
^~Add a ratelimited message in case a storm of errors is encountered.
Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing")
Signed-off-by: Qian Cai
Reviewed-by: Vishal Verma
Link: https://lore.kernel.org/r/1572530719-32161-1-git-send-email-cai@lca.pw
Signed-off-by: Dan Williams -
The entire point of nd-core.h is to hide functionality that no leaf
driver should touch. In fact, the commit that added it had no need to
include it.Fixes: 06e8ccdab15f ("acpi: nfit: Add support for detect platform...")
Cc: Ira Weiny
Cc: Dave Jiang
Cc: Vishal Verma
Signed-off-by: Dan Williams
15 Nov, 2019
3 commits
-
The nvdimm core currently maps the full namespace to an ioremap range
while probing the namespace mode. This can result in probe failures on
architectures that have limited ioremap space.For example, with a large btt namespace that consumes most of I/O remap
range, depending on the sequence of namespace initialization, the user
can find a pfn namespace initialization failure due to unavailable I/O
remap space which nvdimm core uses for temporary mapping.nvdimm core can avoid this failure by only mapping the reserved info
block area to check for pfn superblock type and map the full namespace
resource only before using the namespace.Given that personalities like BTT can be layered on top of any namespace
type create a generic form of devm_nsio_enable (devm_namespace_enable)
and use it inside the per-personality attach routines. Now
devm_namespace_enable() is always paired with disable unless the mapping
is going to be used for long term runtime access.Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20191017073308.32645-1-aneesh.kumar@linux.ibm.com
[djbw: reworks to move devm_namespace_{en,dis}able into *attach helpers]
Reported-by: kbuild test robot
Link: https://lore.kernel.org/r/20191031105741.102793-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
nvdimm core use nd_pfn_validate when looking for devdax or fsdax namespace. In this
case device resources are allocated against nd_namespace_io dev. In-order to
allow remap of range in nd_pfn_clear_memmap_error(), move the device memmap
area clearing while initializing pfn namespace. With this device
resource are allocated against nd_pfn and we can use nd_pfn->dev for remapping.This also avoids calling nd_pfn_clear_mmap_errors twice. Once while probing the
namespace and second while initializing a pfn namespace.Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20191101032728.113001-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
Don't leave claim_class set to an invalid value if an error occurs in
btt_claim_class().While we are here change the return type of __holder_class_store() to be
clear about the values it is returning.This was found via code inspection.
Reported-by: Dan Carpenter
Reviewed-by: Vishal Verma
Signed-off-by: Ira Weiny
Link: https://lore.kernel.org/r/20190925211348.14082-1-ira.weiny@intel.com
Signed-off-by: Dan Williams
07 Nov, 2019
1 commit
-
In preparation for handling platform differentiated memory types beyond
persistent memory, uplevel the "region" identifier to a global number
space. This enables a device-dax instance to be registered to any memory
type with guaranteed unique names.Signed-off-by: Dan Williams
Acked-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
23 Oct, 2019
1 commit
-
The .ioctl and .compat_ioctl file operations have the same prototype so
they can both point to the same function, which works great almost all
the time when all the commands are compatible.One exception is the s390 architecture, where a compat pointer is only
31 bit wide, and converting it into a 64-bit pointer requires calling
compat_ptr(). Most drivers here will never run in s390, but since we now
have a generic helper for it, it's easy enough to use it consistently.I double-checked all these drivers to ensure that all ioctl arguments
are used as pointers or are ignored, but are not interpreted as integer
values.Acked-by: Jason Gunthorpe
Acked-by: Daniel Vetter
Acked-by: Mauro Carvalho Chehab
Acked-by: Greg Kroah-Hartman
Acked-by: David Sterba
Acked-by: Darren Hart (VMware)
Acked-by: Jonathan Cameron
Acked-by: Bjorn Andersson
Acked-by: Dan Williams
Signed-off-by: Arnd Bergmann
30 Sep, 2019
1 commit
-
More libnvdimm updates from Dan Williams:
- Complete the reworks to interoperate with powerpc dynamic huge page
sizes- Fix a crash due to missed accounting for the powerpc 'struct
page'-memmap mapping granularity- Fix badblock initialization for volatile (DRAM emulated) pmem ranges
- Stop triggering request_key() notifications to userspace when
NVDIMM-security is disabled / not present- Miscellaneous small fixups
* tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/region: Enable MAP_SYNC for volatile regions
libnvdimm: prevent nvdimm from requesting key when security is disabled
libnvdimm/region: Initialize bad block for volatile namespaces
libnvdimm/nfit_test: Fix acpi_handle redefinition
libnvdimm/altmap: Track namespace boundaries in altmap
libnvdimm: Fix endian conversion issues
libnvdimm/dax: Pick the right alignment default when creating dax devices
powerpc/book3s64: Export has_transparent_hugepage() related functions.
25 Sep, 2019
6 commits
-
Some environments want to use a host tmpfs/ramdisk to back guest pmem.
While the data is not persisted relative to the host it *is* persisted
relative to guest crashes / reboots. The guest is free to use dax and
MAP_SYNC to keep filesystem metadata consistent with dax accesses
without requiring guest fsync(). The guest can also observe that the
region is volatile and skip cache flushing as global visibility is
enough to "persist" data relative to the host staying alive over guest
reset events.Signed-off-by: Aneesh Kumar K.V
Reviewed-by: Pankaj Gupta
Link: https://lore.kernel.org/r/20190924114327.14700-1-aneesh.kumar@linux.ibm.com
[djbw: reword the changelog]
Signed-off-by: Dan Williams -
Current implementation attempts to request keys from the keyring even when
security is not enabled. Change behavior so when security is disabled it
will skip key request.Error messages seen when no keys are installed and libnvdimm is loaded:
request-key[4598]: Cannot find command to construct key 661489677
request-key[4606]: Cannot find command to construct key 34713726Cc: stable@vger.kernel.org
Fixes: 4c6926a23b76 ("acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMs")
Signed-off-by: Dave Jiang
Link: https://lore.kernel.org/r/156934642272.30222.5230162488753445916.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams -
We do check for a bad block during namespace init and that use
region bad block list. We need to initialize the bad block
for volatile regions for this to work. We also observe a lockdep
warning as below because the lock is not initialized correctly
since we skip bad block init for volatile regions.INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
Call Trace:
[c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
[c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
[c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
[c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
[c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
[c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
[c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
[c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
[c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
[c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
[c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
[c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
[c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
[c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
[c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
[c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
[c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
[c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
[c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
[c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
[c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
[c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
[c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
With PFN_MODE_PMEM namespace, the memmap area is allocated from the device
area. Some architectures map the memmap area with large page size. On
architectures like ppc64, 16MB page for memap mapping can map 262144 pfns.
This maps a namespace size of 16G.When populating memmap region with 16MB page from the device area,
make sure the allocated space is not used to map resources outside this
namespace. Such usage of device area will prevent a namespace destroy.Add resource end pnf in altmap and use that to check if the memmap area
allocation can map pfn outside the namespace. On ppc64 in such case we fallback
to allocation from memory.This fix kernel crash reported below:
[ 132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0
[ 133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000
[ 133.464760] Faulting instruction address: 0xc00000000007580c
[ 133.464766] Oops: Kernel access of bad area, sig: 11 [#1]
[ 133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
.....
[ 133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0
[ 133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0
[ 133.464910] Call Trace:
[ 133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable)
[ 133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240
[ 133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590
[ 133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160
[ 133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0
[ 133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50
[ 133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400
[ 133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250
[ 133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110
[ 133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70
[ 133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0
[ 133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250
[ 133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70
[ 133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250djbw: Aneesh notes that this crash can likely be triggered in any kernel that
supports 'papr_scm', so flagging that commit for -stable consideration.Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Cc:
Reported-by: Sachin Sant
Signed-off-by: Aneesh Kumar K.V
Reviewed-by: Pankaj Gupta
Tested-by: Santosh Sivaraj
Reviewed-by: Johannes Thumshirn
Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
nd_label->dpa issue was observed when trying to enable the namespace created
with little-endian kernel on a big-endian kernel. That made me run
`sparse` on the rest of the code and other changes are the result of that.Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing")
Fixes: 9dedc73a4658 ("libnvdimm/btt: Fix LBA masking during 'free list' population")
Reviewed-by: Vishal Verma
Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190809074726.27815-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
Allow arch to provide the supported alignments and use hugepage alignment only
if we support hugepage. Right now we depend on compile time configs whereas this
patch switch this to runtime discovery.Architectures like ppc64 can have THP enabled in code, but then can have
hugepage size disabled by the hypervisor. This allows us to create dax devices
with PAGE_SIZE alignment in this case.Existing dax namespace with alignment larger than PAGE_SIZE will fail to
initialize in this specific case. We still allow fsdax namespace initialization.With respect to identifying whether to enable hugepage fault for a dax device,
if THP is enabled during compile, we default to taking hugepage fault and in dax
fault handler if we find the fault size > alignment we retry with PAGE_SIZE
fault size.This also addresses the below failure scenario on ppc64
ndctl create-namespace --mode=devdax | grep align
"align":16777216,
"align":16777216cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments
65536 16777216daxio.static-debug -z -o /dev/dax0.0
Bus error (core dumped)$ dmesg | tail
lpar: Failed hash pte insert with error -4
hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 access=0x8000000000000006 current=daxio
hash-mmu: trap=0x300 vsid=0x22cb7a3 ssize=1 base psize=2 psize 10 pte=0xc000000501002b86
daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000]
daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 4800012c e93f0088 f93f0120
daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 e93f0088 39290008 f93f0110The failure was due to guest kernel using wrong page size.
The namespaces created with 16M alignment will appear as below on a config with
16M page size disabled.$ ndctl list -Ni
[
{
"dev":"namespace0.1",
"mode":"fsdax",
"map":"dev",
"size":5351931904,
"uuid":"fc6e9667-461a-4718-82b4-69b24570bddb",
"align":16777216,
"blockdev":"pmem0.1",
"supported_alignments":[
65536
]
},
{
"dev":"namespace0.0",
"mode":"fsdax",
Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190905154603.10349-8-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams
22 Sep, 2019
2 commits
-
Pull libnvdimm updates from Dan Williams:
"Some reworks to better support nvdimms on powerpc and an nvdimm
security interface update:- Rework the nvdimm core to accommodate architectures with different
page sizes and ones that can change supported huge page sizes at
boot time rather than a compile time constant.- Introduce a distinct 'frozen' attribute for the nvdimm security
state since it is independent of the locked state.- Miscellaneous fixups"
* tag 'libnvdimm-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm: Use PAGE_SIZE instead of SZ_4K for align check
libnvdimm/label: Remove the dpa align check
libnvdimm/pfn_dev: Add page size and struct page size to pfn superblock
libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page size change
libnvdimm/pmem: Advance namespace seed for specific probe errors
libnvdimm/region: Rewrite _probe_success() to _advance_seeds()
libnvdimm/security: Consolidate 'security' operations
libnvdimm/security: Tighten scope of nvdimm->busy vs security operations
libnvdimm/security: Introduce a 'frozen' attribute
libnvdimm, region: Use struct_size() in kzalloc()
tools/testing/nvdimm: Fix fallthrough warning
libnvdimm/of_pmem: Provide a unique name for bus provider -
Pull hmm updates from Jason Gunthorpe:
"This is more cleanup and consolidation of the hmm APIs and the very
strongly related mmu_notifier interfaces. Many places across the tree
using these interfaces are touched in the process. Beyond that a
cleanup to the page walker API and a few memremap related changes
round out the series:- General improvement of hmm_range_fault() and related APIs, more
documentation, bug fixes from testing, API simplification &
consolidation, and unused API removal- Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE,
and make them internal kconfig selects- Hoist a lot of code related to mmu notifier attachment out of
drivers by using a refcount get/put attachment idiom and remove the
convoluted mmu_notifier_unregister_no_release() and related APIs.- General API improvement for the migrate_vma API and revision of its
only user in nouveau- Annotate mmu_notifiers with lockdep and sleeping region debugging
Two series unrelated to HMM or mmu_notifiers came along due to
dependencies:- Allow pagemap's memremap_pages family of APIs to work without
providing a struct device- Make walk_page_range() and related use a constant structure for
function pointers"* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits)
libnvdimm: Enable unit test infrastructure compile checks
mm, notifier: Catch sleeping/blocking for !blockable
kernel.h: Add non_block_start/end()
drm/radeon: guard against calling an unpaired radeon_mn_unregister()
csky: add missing brackets in a macro for tlb.h
pagewalk: use lockdep_assert_held for locking validation
pagewalk: separate function pointers from iterator data
mm: split out a new pagewalk.h header from mm.h
mm/mmu_notifiers: annotate with might_sleep()
mm/mmu_notifiers: prime lockdep
mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports
mm/hmm: hmm_range_fault() infinite loop
mm/hmm: hmm_range_fault() NULL pointer bug
mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
mm/mmu_notifiers: remove unregister_no_release
RDMA/odp: remove ib_ucontext from ib_umem
RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
...
07 Sep, 2019
1 commit
-
The infrastructure to mock core libnvdimm routines for unit testing
purposes is prone to bitrot relative to refactoring of that core. Arrange
for the unit test core to be built when CONFIG_COMPILE_TEST=y. This does
not result in a functional unit test environment, it is only a helper for
0day to catch unit test build regressions.Note that there are a few x86isms in the implementation, so this does not
bother compile testing this architectures other than 64-bit x86.Link: https://lore.kernel.org/r/156763690875.2556198.15786177395425033830.stgit@dwillia2-desk3.amr.corp.intel.com
Reported-by: Christoph Hellwig
Signed-off-by: Dan Williams
Signed-off-by: Jason Gunthorpe
06 Sep, 2019
6 commits
-
Architectures have different page size than 4K. Use the PAGE_SIZE
to make sure ranges are correctly aligned.Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190905154603.10349-7-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
There's no strict requirement why slot_valid() needs to check for page alignment
and it would seem to actively hurt cross-page-size compatibility. Let's
delete the check and rely on checksum validation.Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190905154603.10349-6-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
This is needed so that pmem probe don't wrongly initialize a namespace
which doesn't have enough space reserved for holding struct pages
with the current kernel.Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190905154603.10349-5-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
Namespaces created with PFN_MODE_PMEM mode stores struct page in the reserve
block area. We need to make sure we account for the right struct page
size while doing this. Instead of directly depending on sizeof(struct page)
which can change based on different kernel config option, use the max struct
page size (64) while calculating the reserve block area. This makes sure pmem
device can be used across kernels built with different configs.If the above assumption of max struct page size change, we need to update the
reserve block allocation space for new namespaces created.Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190905154603.10349-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
In order to support marking namespaces with unsupported feature/versions
disabled, nvdimm core should advance the namespace seed on these
probe failures. Otherwise, these failed namespaces will be considered a
seed namespace and will be wrongly used while creating new namespaces.Add -EOPNOTSUPP as return from pmem probe callback to indicate a namespace
initialization failures due to pfn superblock feature/version mismatch.Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190905154603.10349-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams -
The nd_region_probe_success() helper collides seed management with
nvdimm->busy tracking. Given the 'busy' increment is handled internal to the
nd_region driver 'probe' path move the decrement to the 'remove' path.
With that cleanup the routine can be renamed to the more descriptive
nd_region_advance_seeds().The change is prompted by an incoming need to optionally advance the
seeds on other events besides 'probe' success.Cc: "Aneesh Kumar K.V"
Signed-off-by: Dan Williams
Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190905154603.10349-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams
30 Aug, 2019
2 commits
-
The security operations are exported from libnvdimm/security.c to
libnvdimm/dimm_devs.c, and libnvdimm/security.c is optionally compiled
based on the CONFIG_NVDIMM_KEYS config symbol.Rather than export the operations across compile objects, just move the
__security_store() entry point to live with the helpers.Acked-by: Jeff Moyer
Reviewed-by: Dave Jiang
Link: https://lore.kernel.org/r/156686730515.184120.10522747907309996674.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams -
An attempt to freeze DIMMs currently runs afoul of default blocking of
all security operations in the entry to the 'store' routine for the
'security' sysfs attribute.The blanket blocking of all security operations while the DIMM is in
active use in a region is too restrictive. The only security operations
that need to be aware of the ->busy state are those that mutate the
state of data, i.e. erase and overwrite.Refactor the ->busy checks to be applied at the entry common entry point
in __security_store() rather than each of the helper routines to enable
freeze to be run regardless of busy state.Reviewed-by: Dave Jiang
Reviewed-by: Jeff Moyer
Link: https://lore.kernel.org/r/156686729996.184120.3458026302402493937.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams