30 Dec, 2020

2 commits

  • commit 2dd2a1740ee19cd2636d247276cf27bfa434b0e2 upstream.

    A recent change to ndctl to attempt to reconfigure namespaces in place
    uncovered a label accounting problem in block-window-type namespaces.
    The ndctl "create.sh" test is able to trigger this signature:

    WARNING: CPU: 34 PID: 9167 at drivers/nvdimm/label.c:1100 __blk_label_update+0x9a3/0xbc0 [libnvdimm]
    [..]
    RIP: 0010:__blk_label_update+0x9a3/0xbc0 [libnvdimm]
    [..]
    Call Trace:
    uuid_store+0x21b/0x2f0 [libnvdimm]
    kernfs_fop_write+0xcf/0x1c0
    vfs_write+0xcc/0x380
    ksys_write+0x68/0xe0

    When allocated capacity for a namespace is renamed (new UUID) the labels
    with the old UUID need to be deleted. The ndctl behavior to always
    destroy namespaces on reconfiguration hid this problem.

    The immediate impact of this bug is limited since block-window-type
    namespaces only seem to exist in the specification and not in any
    shipping products. However, the label handling code is being reused for
    other technologies like CXL region labels, so there is a benefit to
    making sure both vertical labels sets (block-window) and horizontal
    label sets (pmem) have a functional reference implementation in
    libnvdimm.

    Fixes: c4703ce11c23 ("libnvdimm/namespace: Fix label tracking error")
    Cc:
    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: Ira Weiny
    Signed-off-by: Dan Williams
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     
  • [ Upstream commit 4c46764733c85b82c07e9559b39da4d00a7dd659 ]

    Forget to set error code when nd_label_alloc_slot failed, and we
    add it to avoid overwritten error code.

    Fixes: 0ba1c634892b ("libnvdimm: write blk label set")
    Signed-off-by: Zhang Qilong
    Link: https://lore.kernel.org/r/20201205115056.2076523-1-zhangqilong3@huawei.com
    Signed-off-by: Dan Williams
    Signed-off-by: Sasha Levin

    Zhang Qilong
     

14 Oct, 2020

3 commits

  • In support of device-dax growing the ability to front physically
    dis-contiguous ranges of memory, update devm_memremap_pages() to track
    multiple ranges with a single reference counter and devm instance.

    Convert all [devm_]memremap_pages() users to specify the number of ranges
    they are mapping in their 'struct dev_pagemap' instance.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Dave Jiang
    Cc: Ben Skeggs
    Cc: David Airlie
    Cc: Daniel Vetter
    Cc: Ira Weiny
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: "Jérôme Glisse"
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: kernel test robot
    Cc: Mike Rapoport
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • The 'struct resource' in 'struct dev_pagemap' is only used for holding
    resource span information. The other fields, 'name', 'flags', 'desc',
    'parent', 'sibling', and 'child' are all unused wasted space.

    This is in preparation for introducing a multi-range extension of
    devm_memremap_pages().

    The bulk of this change is unwinding all the places internal to libnvdimm
    that used 'struct resource' unnecessarily, and replacing instances of
    'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

    P2PDMA had a minor usage of the resource flags field, but only to report
    failures with "%pR". That is replaced with an open coded print of the
    range.

    [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
    Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

    Signed-off-by: Dan Williams
    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Reviewed-by: Boris Ostrovsky [xen]
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Dave Jiang
    Cc: Ben Skeggs
    Cc: David Airlie
    Cc: Daniel Vetter
    Cc: Ira Weiny
    Cc: Bjorn Helgaas
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: "Jérôme Glisse"
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: kernel test robot
    Cc: Mike Rapoport
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Pull block updates from Jens Axboe:

    - Series of merge handling cleanups (Baolin, Christoph)

    - Series of blk-throttle fixes and cleanups (Baolin)

    - Series cleaning up BDI, seperating the block device from the
    backing_dev_info (Christoph)

    - Removal of bdget() as a generic API (Christoph)

    - Removal of blkdev_get() as a generic API (Christoph)

    - Cleanup of is-partition checks (Christoph)

    - Series reworking disk revalidation (Christoph)

    - Series cleaning up bio flags (Christoph)

    - bio crypt fixes (Eric)

    - IO stats inflight tweak (Gabriel)

    - blk-mq tags fixes (Hannes)

    - Buffer invalidation fixes (Jan)

    - Allow soft limits for zone append (Johannes)

    - Shared tag set improvements (John, Kashyap)

    - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

    - DM no-wait support (Mike, Konstantin)

    - Request allocation improvements (Ming)

    - Allow md/dm/bcache to use IO stat helpers (Song)

    - Series improving blk-iocost (Tejun)

    - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
    Xianting, Yang, Yufen, yangerkun)

    * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
    block: fix uapi blkzoned.h comments
    blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
    blk-mq: get rid of the dead flush handle code path
    block: get rid of unnecessary local variable
    block: fix comment and add lockdep assert
    blk-mq: use helper function to test hw stopped
    block: use helper function to test queue register
    block: remove redundant mq check
    block: invoke blk_mq_exit_sched no matter whether have .exit_sched
    percpu_ref: don't refer to ref->data if it isn't allocated
    block: ratelimit handle_bad_sector() message
    blk-throttle: Re-use the throtl_set_slice_end()
    blk-throttle: Open code __throtl_de/enqueue_tg()
    blk-throttle: Move service tree validation out of the throtl_rb_first()
    blk-throttle: Move the list operation after list validation
    blk-throttle: Fix IO hang for a corner case
    blk-throttle: Avoid tracking latency if low limit is invalid
    blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
    blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
    block: Remove redundant 'return' statement
    ...

    Linus Torvalds
     

13 Oct, 2020

1 commit

  • Pull RAS updates from Borislav Petkov:

    - Extend the recovery from MCE in kernel space also to processes which
    encounter an MCE in kernel space but while copying from user memory
    by sending them a SIGBUS on return to user space and umapping the
    faulty memory, by Tony Luck and Youquan Song.

    - memcpy_mcsafe() rework by splitting the functionality into
    copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
    support for new hardware which can recover from a machine check
    encountered during a fast string copy and makes that the default and
    lets the older hardware which does not support that advance recovery,
    opt in to use the old, fragile, slow variant, by Dan Williams.

    - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.

    - Do not use MSR-tracing accessors in #MC context and flag any fault
    while accessing MCA architectural MSRs as an architectural violation
    with the hope that such hw/fw misdesigns are caught early during the
    hw eval phase and they don't make it into production.

    - Misc fixes, improvements and cleanups, as always.

    * tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mce: Allow for copy_mc_fragile symbol checksum to be generated
    x86/mce: Decode a kernel instruction to determine if it is copying from user
    x86/mce: Recover from poison found while copying from user space
    x86/mce: Avoid tail copy when machine check terminated a copy from user
    x86/mce: Add _ASM_EXTABLE_CPY for copy user access
    x86/mce: Provide method to find out the type of an exception handler
    x86/mce: Pass pointer to saved pt_regs to severity calculation routines
    x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
    x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
    x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list
    x86/mce: Add Skylake quirk for patrol scrub reported errors
    RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE()
    x86/mce: Annotate mce_rd/wrmsrl() with noinstr
    x86/mce/dev-mcelog: Do not update kflags on AMD systems
    x86/mce: Stop mce_reign() from re-computing severity for every CPU
    x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
    x86/mce: Increase maximum number of banks to 64
    x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()
    x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap
    RAS/CEC: Fix cec_init() prototype

    Linus Torvalds
     

06 Oct, 2020

1 commit

  • In reaction to a proposal to introduce a memcpy_mcsafe_fast()
    implementation Linus points out that memcpy_mcsafe() is poorly named
    relative to communicating the scope of the interface. Specifically what
    addresses are valid to pass as source, destination, and what faults /
    exceptions are handled.

    Of particular concern is that even though x86 might be able to handle
    the semantics of copy_mc_to_user() with its common copy_user_generic()
    implementation other archs likely need / want an explicit path for this
    case:

    On Fri, May 1, 2020 at 11:28 AM Linus Torvalds wrote:
    >
    > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams wrote:
    > >
    > > However now I see that copy_user_generic() works for the wrong reason.
    > > It works because the exception on the source address due to poison
    > > looks no different than a write fault on the user address to the
    > > caller, it's still just a short copy. So it makes copy_to_user() work
    > > for the wrong reason relative to the name.
    >
    > Right.
    >
    > And it won't work that way on other architectures. On x86, we have a
    > generic function that can take faults on either side, and we use it
    > for both cases (and for the "in_user" case too), but that's an
    > artifact of the architecture oddity.
    >
    > In fact, it's probably wrong even on x86 - because it can hide bugs -
    > but writing those things is painful enough that everybody prefers
    > having just one function.

    Replace a single top-level memcpy_mcsafe() with either
    copy_mc_to_user(), or copy_mc_to_kernel().

    Introduce an x86 copy_mc_fragile() name as the rename for the
    low-level x86 implementation formerly named memcpy_mcsafe(). It is used
    as the slow / careful backend that is supplanted by a fast
    copy_mc_generic() in a follow-on patch.

    One side-effect of this reorganization is that separating copy_mc_64.S
    to its own file means that perf no longer needs to track dependencies
    for its memcpy_64.S benchmarks.

    [ bp: Massage a bit. ]

    Signed-off-by: Dan Williams
    Signed-off-by: Borislav Petkov
    Reviewed-by: Tony Luck
    Acked-by: Michael Ellerman
    Cc:
    Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
    Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     

25 Sep, 2020

1 commit

  • BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to
    decided if ->rw_page can be used on a block device. Just check up for
    the method instead. The only complication is that zram needs a second
    set of block_device_operations as it can switch between modes that
    actually support ->rw_page and those who don't.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

02 Sep, 2020

1 commit

  • The nvdimm block driver abuse revalidate_disk in a strange way, and
    totally unrelated to what other drivers do. Simplify this by just
    calling nvdimm_revalidate_disk (which seems rather misnamed) from the
    probe routines, as the additional bdev size revalidation is pointless
    at this point, and remove the revalidate_disk methods given that
    it can only be triggered from add_disk, which is right before the
    manual calls.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Josef Bacik
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

18 Aug, 2020

1 commit

  • Because the last member of the "nvdimm_firmware_attributes" array
    was not assigned a null ptr, when traversal of "grp->attrs" array
    is out of bounds in "create_files" func.

    func:
    create_files:
    ->for (i = 0, attr = grp->attrs; *attr && !error; i++, attr++)
    ->....

    BUG: KASAN: global-out-of-bounds in create_files fs/sysfs/group.c:43 [inline]
    BUG: KASAN: global-out-of-bounds in internal_create_group+0x9d8/0xb20
    fs/sysfs/group.c:149
    Read of size 8 at addr ffffffff8a2e4cf0 by task kworker/u17:10/959

    CPU: 2 PID: 959 Comm: kworker/u17:10 Not tainted 5.8.0-syzkaller #0
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
    BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
    Workqueue: events_unbound async_run_entry_fn
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x18f/0x20d lib/dump_stack.c:118
    print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383
    __kasan_report mm/kasan/report.c:513 [inline]
    kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
    create_files fs/sysfs/group.c:43 [inline]
    internal_create_group+0x9d8/0xb20 fs/sysfs/group.c:149
    internal_create_groups.part.0+0x90/0x140 fs/sysfs/group.c:189
    internal_create_groups fs/sysfs/group.c:185 [inline]
    sysfs_create_groups+0x25/0x50 fs/sysfs/group.c:215
    device_add_groups drivers/base/core.c:2024 [inline]
    device_add_attrs drivers/base/core.c:2178 [inline]
    device_add+0x7fd/0x1c40 drivers/base/core.c:2881
    nd_async_device_register+0x12/0x80 drivers/nvdimm/bus.c:506
    async_run_entry_fn+0x121/0x530 kernel/async.c:123
    process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
    worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
    kthread+0x3b5/0x4a0 kernel/kthread.c:292
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

    The buggy address belongs to the variable:
    nvdimm_firmware_attributes+0x10/0x40

    Link: https://lore.kernel.org/r/20200812085501.30963-1-qiang.zhang@windriver.com
    Link: https://lore.kernel.org/r/20200814150509.225615-1-vaibhav@linux.ibm.com
    Fixes: 48001ea50d17f ("PM, libnvdimm: Add runtime firmware activation support")
    Reported-by: syzbot+1cf0ffe61aecf46f588f@syzkaller.appspotmail.com
    Reported-by: Sandipan Das
    Reported-by: Vaibhav Jain
    Reviewed-by: Ira Weiny
    Signed-off-by: Zqiang
    Signed-off-by: Vishal Verma

    Zqiang
     

15 Aug, 2020

1 commit

  • This function returns the number of bytes in a THP. It is like
    page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE
    is disabled.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Reviewed-by: William Kucharski
    Reviewed-by: Zi Yan
    Cc: David Hildenbrand
    Cc: Mike Kravetz
    Cc: "Kirill A. Shutemov"
    Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

12 Aug, 2020

2 commits

  • Pull virtio updates from Michael Tsirkin:

    - IRQ bypass support for vdpa and IFC

    - MLX5 vdpa driver

    - Endianness fixes for virtio drivers

    - Misc other fixes

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits)
    vdpa/mlx5: fix up endian-ness for mtu
    vdpa: Fix pointer math bug in vdpasim_get_config()
    vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config()
    vdpa/mlx5: fix memory allocation failure checks
    vdpa/mlx5: Fix uninitialised variable in core/mr.c
    vdpa_sim: init iommu lock
    virtio_config: fix up warnings on parisc
    vdpa/mlx5: Add VDPA driver for supported mlx5 devices
    vdpa/mlx5: Add shared memory registration code
    vdpa/mlx5: Add support library for mlx5 VDPA implementation
    vdpa/mlx5: Add hardware descriptive header file
    vdpa: Modify get_vq_state() to return error code
    net/vdpa: Use struct for set/get vq state
    vdpa: remove hard coded virtq num
    vdpasim: support batch updating
    vhost-vdpa: support IOTLB batching hints
    vhost-vdpa: support get/set backend features
    vhost: generialize backend features setting/getting
    vhost-vdpa: refine ioctl pre-processing
    vDPA: dont change vq irq after DRIVER_OK
    ...

    Linus Torvalds
     
  • Pull libnvdimm updayes from Vishal Verma:
    "You'd normally receive this pull request from Dan Williams, but he's
    busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm
    this cycle.

    This adds a new feature in libnvdimm - 'Runtime Firmware Activation',
    and a few small cleanups and fixes in libnvdimm and DAX. I'd
    originally intended to make separate topic-based pull requests - one
    for libnvdimm, and one for DAX, but some of the DAX material fell out
    since it wasn't quite ready.

    Summary:

    - add 'Runtime Firmware Activation' support for NVDIMMs that
    advertise the relevant capability

    - misc libnvdimm and DAX cleanups"

    * tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr
    libnvdimm/security: the 'security' attr never show 'overwrite' state
    libnvdimm/security: fix a typo
    ACPI: NFIT: Fix ARS zero-sized allocation
    dax: Fix incorrect argument passed to xas_set_err()
    ACPI: NFIT: Add runtime firmware activate support
    PM, libnvdimm: Add runtime firmware activation support
    libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO()
    drivers/dax: Expand lock scope to cover the use of addresses
    fs/dax: Remove unused size parameter
    dax: print error message by pr_info() in __generic_fsdax_supported()
    driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW}
    tools/testing/nvdimm: Emulate firmware activation commands
    tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation
    tools/testing/nvdimm: Add command debug messages
    tools/testing/nvdimm: Cleanup dimm index passing
    ACPI: NFIT: Define runtime firmware activation commands
    ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor
    libnvdimm: Validate command family indices

    Linus Torvalds
     

08 Aug, 2020

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Add support for (optionally) using queued spinlocks & rwlocks.

    - Support for a new faster system call ABI using the scv instruction on
    Power9 or later.

    - Drop support for the PROT_SAO mmap/mprotect flag as it will be
    unsupported on Power10 and future processors, leaving us with no way
    to implement the functionality it requests. This risks breaking
    userspace, though we believe it is unused in practice.

    - A bug fix for, and then the removal of, our custom stack expansion
    checking. We now allow stack expansion up to the rlimit, like other
    architectures.

    - Remove the remnants of our (previously disabled) topology update
    code, which tried to react to NUMA layout changes on virtualised
    systems, but was prone to crashes and other problems.

    - Add PMU support for Power10 CPUs.

    - A change to our signal trampoline so that we don't unbalance the link
    stack (branch return predictor) in the signal delivery path.

    - Lots of other cleanups, refactorings, smaller features and so on as
    usual.

    Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
    Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
    T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
    S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
    Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
    Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
    Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
    Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
    Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
    Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
    Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
    Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
    Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
    Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
    Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
    Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
    Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
    Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
    Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
    Wei Yongjun, Wen Xiong, YueHaibing.

    * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
    selftests/powerpc: Fix pkey syscall redefinitions
    powerpc: Fix circular dependency between percpu.h and mmu.h
    powerpc/powernv/sriov: Fix use of uninitialised variable
    selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
    powerpc/40x: Fix assembler warning about r0
    powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
    powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
    cpuidle: pseries: Fixup exit latency for CEDE(0)
    cpuidle: pseries: Add function to parse extended CEDE records
    cpuidle: pseries: Set the latency-hint before entering CEDE
    selftests/powerpc: Fix online CPU selection
    powerpc/perf: Consolidate perf_callchain_user_[64|32]()
    powerpc/pseries/hotplug-cpu: Remove double free in error path
    powerpc/pseries/mobility: Add pr_debug() for device tree changes
    powerpc/pseries/mobility: Set pr_fmt()
    powerpc/cacheinfo: Warn if cache object chain becomes unordered
    powerpc/cacheinfo: Improve diagnostics about malformed cache lists
    powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
    powerpc/cacheinfo: Set pr_fmt()
    powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
    ...

    Linus Torvalds
     

05 Aug, 2020

1 commit


04 Aug, 2020

4 commits

  • commit 7d988097c546 ("acpi/nfit, libnvdimm/security: Add security DSM overwrite support")
    adds a sysfs_notify_dirent() to wake up userspace poll thread when the "overwrite"
    operation has completed. But the notification is issued before the internal
    dimm security state and flags have been updated, so the userspace poll thread
    wakes up and fetches the not-yet-updated attr and falls back to sleep, forever.
    But if user from another terminal issue "ndctl wait-overwrite nmemX" again,
    the command returns instantly.

    Link: https://lore.kernel.org/r/1596494499-9852-3-git-send-email-jane.chu@oracle.com
    Fixes: 7d988097c546 ("acpi/nfit, libnvdimm/security: Add security DSM overwrite support")
    Cc: Dave Jiang
    Cc: Dan Williams
    Reviewed-by: Dave Jiang
    Signed-off-by: Jane Chu
    Signed-off-by: Vishal Verma

    Jane Chu
     
  • 'security' attribute displays the security state of an nvdimm.
    During normal operation, the nvdimm state maybe one of 'disabled',
    'unlocked' or 'locked'. When an admin issues
    # ndctl sanitize-dimm nmem0 --overwrite
    the attribute is expected to change to 'overwrite' until the overwrite
    operation completes.

    But tests on our systems show that 'overwrite' is never shown during
    the overwrite operation. i.e.
    # cat /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/security
    unlocked
    the attribute remain 'unlocked' through out the operation, consequently
    "ndctl wait-overwrite nmem0" command doesn't wait at all.

    The driver tracks the state in 'nvdimm->sec.flags': when the operation
    starts, it adds an overwrite bit to the flags; and when the operation
    completes, it removes the bit. Hence security_show() should check the
    'overwrite' bit first, in order to indicate the actual state when multiple
    bits are set in the flags.

    Link: https://lore.kernel.org/r/1596494499-9852-2-git-send-email-jane.chu@oracle.com
    Reviewed-by: Dave Jiang
    Signed-off-by: Jane Chu
    Signed-off-by: Vishal Verma

    Jane Chu
     
  • commit d78c620a2e82 ("libnvdimm/security: Introduce a 'frozen' attribute")
    introduced a typo, causing a 'nvdimm->sec.flags' update being overwritten
    by the subsequent update meant for 'nvdimm->sec.ext_flags'.

    Link: https://lore.kernel.org/r/1596494499-9852-1-git-send-email-jane.chu@oracle.com
    Fixes: d78c620a2e82 ("libnvdimm/security: Introduce a 'frozen' attribute")
    Cc: Dan Williams
    Reviewed-by: Dave Jiang
    Signed-off-by: Jane Chu
    Signed-off-by: Vishal Verma

    Jane Chu
     
  • Pull core block updates from Jens Axboe:
    "Good amount of cleanups and tech debt removals in here, and as a
    result, the diffstat shows a nice net reduction in code.

    - Softirq completion cleanups (Christoph)

    - Stop using ->queuedata (Christoph)

    - Cleanup bd claiming (Christoph)

    - Use check_events, moving away from the legacy media change
    (Christoph)

    - Use inode i_blkbits consistently (Christoph)

    - Remove old unused writeback congestion bits (Christoph)

    - Cleanup/unify submission path (Christoph)

    - Use bio_uninit consistently, instead of bio_disassociate_blkg
    (Christoph)

    - sbitmap cleared bits handling (John)

    - Request merging blktrace event addition (Jan)

    - sysfs add/remove race fixes (Luis)

    - blk-mq tag fixes/optimizations (Ming)

    - Duplicate words in comments (Randy)

    - Flush deferral cleanup (Yufen)

    - IO context locking/retry fixes (John)

    - struct_size() usage (Gustavo)

    - blk-iocost fixes (Chengming)

    - blk-cgroup IO stats fixes (Boris)

    - Various little fixes"

    * tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
    block: blk-timeout: delete duplicated word
    block: blk-mq-sched: delete duplicated word
    block: blk-mq: delete duplicated word
    block: genhd: delete duplicated words
    block: elevator: delete duplicated word and fix typos
    block: bio: delete duplicated words
    block: bfq-iosched: fix duplicated word
    iocost_monitor: start from the oldest usage index
    iocost: Fix check condition of iocg abs_vdebt
    block: Remove callback typedefs for blk_mq_ops
    block: Use non _rcu version of list functions for tag_set_list
    blk-cgroup: show global disk stats in root cgroup io.stat
    blk-cgroup: make iostat functions visible to stat printing
    block: improve discard bio alignment in __blkdev_issue_discard()
    block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
    block: defer flush request no matter whether we have elevator
    block: make blk_timeout_init() static
    block: remove retry loop in ioc_release_fn()
    block: remove unnecessary ioc nested locking
    block: integrate bd_start_claiming into __blkdev_get
    ...

    Linus Torvalds
     

29 Jul, 2020

3 commits

  • Plumb the platform specific backend for the generic libnvdimm firmware
    activate interface. Register dimm level operations to arm/disarm
    activation, and register bus level operations to report the dynamic
    platform-quiesce time relative to the number of dimms armed for firmware
    activation.

    A new nfit-specific bus attribute "firmware_activate_noidle" is added to
    allow the activation to switch between platform enforced, and OS
    opportunistic device quiesce. In other words, let the hibernate cycle
    handle in-flight device-dma rather than the platform attempting to
    increase PCI-E timeouts and the like.

    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Vishal Verma
    Signed-off-by: Dan Williams
    Signed-off-by: Vishal Verma

    Dan Williams
     
  • Abstract platform specific mechanics for nvdimm firmware activation
    behind a handful of generic ops. At the bus level ->activate_state()
    indicates the unified state (idle, busy, armed) of all DIMMs on the bus,
    and ->capability() indicates the system state expectations for activate.
    At the DIMM level ->activate_state() indicates the per-DIMM state,
    ->activate_result() indicates the outcome of the last activation
    attempt, and ->arm() attempts to transition the DIMM from 'idle' to
    'armed'.

    A new hibernate_quiet_exec() facility is added to support firmware
    activation in an OS defined system quiesce state. It leverages the fact
    that the hibernate-freeze state wants to assert that a memory
    hibernation snapshot can be taken. This is in contrast to a platform
    firmware defined quiesce state that may forcefully quiet the memory
    controller independent of whether an individual device-driver properly
    supports hibernate-freeze.

    The libnvdimm sysfs interface is extended to support detection of a
    firmware activate capability. The mechanism supports enumeration and
    triggering of firmware activate, optionally in the
    hibernate_quiet_exec() context.

    [rafael: hibernate_quiet_exec() proposal]
    [vishal: fix up sparse warning, grammar in Documentation/]

    Cc: Pavel Machek
    Cc: Ira Weiny
    Cc: Len Brown
    Cc: Jonathan Corbet
    Cc: Dave Jiang
    Cc: Vishal Verma
    Reported-by: kernel test robot
    Co-developed-by: "Rafael J. Wysocki"
    Signed-off-by: "Rafael J. Wysocki"
    Signed-off-by: Dan Williams
    Signed-off-by: Vishal Verma

    Dan Williams
     
  • Move libnvdimm sysfs attributes that currently use an open coded
    DEVICE_ATTR() to hide sensitive root-only information (physical memory
    layout) to the new DEVICE_ATTR_ADMIN_RO() helper.

    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: Ira Weiny
    Signed-off-by: Dan Williams
    Signed-off-by: Vishal Verma

    Dan Williams
     

26 Jul, 2020

1 commit

  • The ND_CMD_CALL format allows for a general passthrough of passlisted
    commands targeting a given command set. However there is no validation
    of the family index relative to what the bus supports.

    - Update the NFIT bus implementation (the only one that supports
    ND_CMD_CALL passthrough) to also passlist the valid set of command
    family indices.

    - Update the generic __nd_ioctl() path to validate that field on behalf
    of all implementations.

    Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism")
    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: "Rafael J. Wysocki"
    Cc: Len Brown
    Cc:
    Signed-off-by: Dan Williams
    Signed-off-by: Vishal Verma

    Dan Williams
     

16 Jul, 2020

2 commits

  • With kernel now supporting new pmem flush/sync instructions, we can now
    enable the kernel to initialize the device. On P10 these devices would
    appear with a new compatible string. For PAPR device we have

    compatible "ibm,pmemory-v2"

    and for OF pmem device we have

    compatible "pmem-region-v2"

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200701072235.223558-8-aneesh.kumar@linux.ibm.com

    Aneesh Kumar K.V
     
  • Architectures like ppc64 provide persistent memory specific barriers
    that will ensure that all stores for which the modifications are
    written to persistent storage by preceding dcbfps and dcbstps
    instructions have updated persistent storage before any data
    access or data transfer caused by subsequent instructions is initiated.
    This is in addition to the ordering done by wmb()

    Update nvdimm core such that architecture can use barriers other than
    wmb to ensure all previous writes are architecturally visible for
    the platform buffer flush.

    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: Dan Williams
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200701072235.223558-5-aneesh.kumar@linux.ibm.com

    Aneesh Kumar K.V
     

09 Jul, 2020

1 commit

  • As of commit 8c0637e950d6 ("keys: Make the KEY_NEED_* perms an enum rather
    than a mask") lookup_user_key() needs an explicit declaration of what it
    wants to do with the key. Add KEY_NEED_SEARCH to fix a warning with the
    below signature, and fixes the inability to retrieve a key.

    WARNING: CPU: 15 PID: 6276 at security/keys/permission.c:35 key_task_permission+0xd3/0x140
    [..]
    RIP: 0010:key_task_permission+0xd3/0x140
    [..]
    Call Trace:
    lookup_user_key+0xeb/0x6b0
    ? vsscanf+0x3df/0x840
    ? key_validate+0x50/0x50
    ? key_default_cmp+0x20/0x20
    nvdimm_get_user_key_payload.part.0+0x21/0x110 [libnvdimm]
    nvdimm_security_store+0x67d/0xb20 [libnvdimm]
    security_store+0x67/0x1a0 [libnvdimm]
    kernfs_fop_write+0xcf/0x1c0
    vfs_write+0xde/0x1d0
    ksys_write+0x68/0xe0
    do_syscall_64+0x5c/0xa0
    entry_SYSCALL_64_after_hwframe+0x49/0xb3

    Fixes: 8c0637e950d6 ("keys: Make the KEY_NEED_* perms an enum rather than a mask")
    Suggested-by: David Howells
    Reviewed-by: Dave Jiang
    Reviewed-by: Ira Weiny
    Cc: Dan Williams
    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: Ira Weiny
    Link: https://lore.kernel.org/r/159297332630.1304143.237026690015653759.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Dan Williams
     

01 Jul, 2020

1 commit

  • The make_request_fn is a little weird in that it sits directly in
    struct request_queue instead of an operation vector. Replace it with
    a block_device_operations method called submit_bio (which describes much
    better what it does). Also remove the request_queue argument to it, as
    the queue can be derived pretty trivially from the bio.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

18 Jun, 2020

1 commit

  • It is possible that a platform that is capable of 'namespace labels'
    comes up without the labels properly initialized. In this case, the
    region's 'align' attribute is hidden. Howerver, once the user does
    initialize he labels, the 'align' attribute still stays hidden, which is
    unexpected.

    The sysfs_update_group() API is meant to address this, and could be
    called during region probe, but it has entanglements with the device
    'lockdep_mutex'. Therefore, simply make the 'align' attribute always
    visible. It doesn't matter what it says for label-less namespaces, since
    it is not possible to change their allocation anyway.

    Suggested-by: Dan Williams
    Signed-off-by: Vishal Verma
    Cc: Dan Williams
    Link: https://lore.kernel.org/r/20200520225026.29426-1-vishal.l.verma@intel.com
    Signed-off-by: Dan Williams

    Vishal Verma
     

14 Jun, 2020

1 commit


09 Jun, 2020

1 commit

  • This seems to lead to some crazy include loops when using
    asm-generic/cacheflush.h on more architectures, so leave it to the arch
    header for now.

    [hch@lst.de: fix warning]
    Link: http://lkml.kernel.org/r/20200520173520.GA11199@lst.de

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Will Deacon
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Anton Ivanov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Dan Williams
    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: Keith Busch
    Cc: Ira Weiny
    Cc: Arnd Bergmann
    Link: http://lkml.kernel.org/r/20200515143646.3857579-7-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

27 May, 2020

1 commit


14 May, 2020

3 commits


09 Apr, 2020

1 commit

  • Pull libnvdimm and dax updates from Dan Williams:
    "There were multiple touches outside of drivers/nvdimm/ this round to
    add cross arch compatibility to the devm_memremap_pages() interface,
    enhance numa information for persistent memory ranges, and add a
    zero_page_range() dax operation.

    This cycle I switched from the patchwork api to Konstantin's b4 script
    for collecting tags (from x86, PowerPC, filesystem, and device-mapper
    folks), and everything looks to have gone ok there. This has all
    appeared in -next with no reported issues.

    Summary:

    - Add support for region alignment configuration and enforcement to
    fix compatibility across architectures and PowerPC page size
    configurations.

    - Introduce 'zero_page_range' as a dax operation. This facilitates
    filesystem-dax operation without a block-device.

    - Introduce phys_to_target_node() to facilitate drivers that want to
    know resulting numa node if a given reserved address range was
    onlined.

    - Advertise a persistence-domain for of_pmem and papr_scm. The
    persistence domain indicates where cpu-store cycles need to reach
    in the platform-memory subsystem before the platform will consider
    them power-fail protected.

    - Promote numa_map_to_online_node() to a cross-kernel generic
    facility.

    - Save x86 numa information to allow for node-id lookups for reserved
    memory ranges, deploy that capability for the e820-pmem driver.

    - Pick up some miscellaneous minor fixes, that missed v5.6-final,
    including a some smatch reports in the ioctl path and some unit
    test compilation fixups.

    - Fixup some flexible-array declarations"

    * tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
    dax: Move mandatory ->zero_page_range() check in alloc_dax()
    dax,iomap: Add helper dax_iomap_zero() to zero a range
    dax: Use new dax zero page method for zeroing a page
    dm,dax: Add dax zero_page_range operation
    s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
    dax, pmem: Add a dax operation zero_page_range
    pmem: Add functions for reading/writing page to/from pmem
    libnvdimm: Update persistence domain value for of_pmem and papr_scm device
    tools/test/nvdimm: Fix out of tree build
    libnvdimm/region: Fix build error
    libnvdimm/region: Replace zero-length array with flexible-array member
    libnvdimm/label: Replace zero-length array with flexible-array member
    ACPI: NFIT: Replace zero-length array with flexible-array member
    libnvdimm/region: Introduce an 'align' attribute
    libnvdimm/region: Introduce NDD_LABELING
    libnvdimm/namespace: Enforce memremap_compat_align()
    libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
    libnvdimm: Out of bounds read in __nd_ioctl()
    acpi/nfit: improve bounds checking for 'func'
    mm/memremap_pages: Introduce memremap_compat_align()
    ...

    Linus Torvalds
     

03 Apr, 2020

5 commits

  • - Introduce 'zero_page_range' as a dax operation. This facilitates
    filesystem-dax operation without a block-device.

    - Advertise a persistence-domain for of_pmem and papr_scm. The
    persistence domain indicates where cpu-store cycles need to reach in
    the platform-memory subsystem before the platform will consider them
    power-fail protected.

    - Fixup some flexible-array declarations.

    Dan Williams
     
  • - Promote numa_map_to_online_node() to a cross-kernel generic facility.

    - Save x86 numa information to allow for node-id lookups for reserved
    memory ranges, deploy that capability for the e820-pmem driver.

    - Introduce phys_to_target_node() to facilitate drivers that want to
    know resulting numa node if a given reserved address range was
    onlined.

    Dan Williams
     
  • Pick up some miscellaneous minor fixes, that missed v5.6-final,
    including a some smatch reports in the ioctl path and some unit test
    compilation fixups.

    Dan Williams
     
  • zero_page_range() dax operation is mandatory for dax devices. Right now
    that check happens in dax_zero_page_range() function. Dan thinks that's
    too late and its better to do the check earlier in alloc_dax().

    I also modified alloc_dax() to return pointer with error code in it in
    case of failure. Right now it returns NULL and caller assumes failure
    happened due to -ENOMEM. But with this ->zero_page_range() check, I
    need to return -EINVAL instead.

    Signed-off-by: Vivek Goyal
    Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com
    Signed-off-by: Dan Williams

    Vivek Goyal
     
  • Add a dax operation zero_page_range, to zero a page. This will also clear any
    known poison in the page being zeroed.

    As of now, zeroing of one page is allowed in a single call. There
    are no callers which are trying to zero more than a page in a single call.
    Once we grow the callers which zero more than a page in single call, we
    can add that support. Primary reason for not doing that yet is that this
    will add little complexity in dm implementation where a range might be
    spanning multiple underlying targets and one will have to split the range
    into multiple sub ranges and call zero_page_range() on individual targets.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Vivek Goyal
    Reviewed-by: Pankaj Gupta
    Link: https://lore.kernel.org/r/20200228163456.1587-3-vgoyal@redhat.com
    Signed-off-by: Dan Williams

    Vivek Goyal