05 Jun, 2020

40 commits

  • Fix the following sparse warning:

    kernel/user.c:85:19: warning: symbol 'uidhash_table' was not declared.
    Should it be static?

    Reported-by: Hulk Robot
    Signed-off-by: Jason Yan
    Signed-off-by: Andrew Morton
    Cc: David Howells
    Cc: Greg Kroah-Hartman
    Cc: Rasmus Villemoes
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200413082146.22737-1-yanaijie@huawei.com
    Signed-off-by: Linus Torvalds

    Jason Yan
     
  • "catch" is reserved keyword in C++, rename it to something both gcc and
    g++ accept.

    Rename "ign" for symmetry.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200331210905.GA31680@avx2
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Instead of keeping NULL terminated array switch to use ARRAY_SIZE()
    which helps to further clean up.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Jens Axboe
    Cc: Andy Shevchenko
    Link: http://lkml.kernel.org/r/20200508100758.51644-1-andriy.shevchenko@linux.intel.com
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • Fix the following coccicheck warning:

    include/linux/mm.h:1371:8-9: WARNING: return of 0/1 in function 'cpupid_pid_unset' with return type bool

    Signed-off-by: Jason Yan
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200422071816.48879-1-yanaijie@huawei.com
    Signed-off-by: Linus Torvalds

    Jason Yan
     
  • Fixes coccicheck warnings:

    mm/zbud.c:246:1-20: WARNING: Assignment of 0/1 to bool variable
    mm/mremap.c:777:2-8: WARNING: Assignment of 0/1 to bool variable
    mm/huge_memory.c:525:9-10: WARNING: return of 0/1 in function 'is_transparent_hugepage' with return type bool

    Reported-by: Hulk Robot
    Signed-off-by: Zou Wei
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Link: http://lkml.kernel.org/r/1586835930-47076-1-git-send-email-zou_wei@huawei.com
    Signed-off-by: Linus Torvalds

    Zou Wei
     
  • There is a comment in typo, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200411004043.14686-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200411003513.14613-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200411002955.14545-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Acked-by: David Rientjes
    Link: http://lkml.kernel.org/r/20200411002247.14468-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200411064723.15855-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There are some typos in comment, fix them.

    s/responsiblity/responsibility
    s/oflline/offline

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200411064246.15781-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There are some typos in comment, fix them.

    s/Fortunatly/Fortunately
    s/taked/taken
    s/necessory/necessary
    s/shink/shrink

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200411064009.15727-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200411065141.15936-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200411071041.16161-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in commet, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200411070701.16097-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200411070307.16021-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There are some typos, fix them.

    s/regsitration/registration
    s/santity/sanity
    s/decremeting/decrementing

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200411071544.16222-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • [akpm@linux-foundation.org: coding style fixes]
    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200410163714.14085-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200410163206.14016-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200410162427.13927-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There is a typo in comment, fix it.
    s/recoreded/recorded

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Ralph Campbell
    Link: http://lkml.kernel.org/r/20200410160328.13843-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • The current codebase makes use of the zero-length array language extension
    to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning in
    case the flexible array does not occur last in the structure, which will
    help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by this
    change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: chenqiwu
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Wei Yang
    Cc: Matthew Wilcox
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Cc: Pankaj Gupta
    Cc: Yang Shi
    Cc: Qian Cai
    Cc: Baoquan He
    Link: http://lkml.kernel.org/r/1586599916-15456-1-git-send-email-qiwuchen55@gmail.com
    Signed-off-by: Linus Torvalds

    chenqiwu
     
  • Memory hotlug is broken for 32b systems at least since c6f03e2903c9 ("mm,
    memory_hotplug: remove zone restrictions") which has considerably reworked
    how can be memory associated with movable/kernel zones. The same is not
    really trivial to achieve in 32b where only lowmem is the kernel zone.
    While we can tweak this immediate problem around there are likely other
    land mines hidden at other places.

    It is also quite dubious that there is a real usecase for the memory
    hotplug on 32b in the first place. Low memory is just too small to be
    hotplugable (for hot add) and generally unusable for hotremove. Adding
    more memory to highmem is also dubious because it would increase the low
    mem or vmalloc space pressure for memmaps.

    Restrict the functionality to 64b systems. This will help future
    development to focus on usecases that have real life application. We can
    remove this restriction in future in presence of a real life usecase of
    course but until then make it explicit that hotplug on 32b is broken and
    requires a non trivial amount of work to fix.

    Robin said:
    "32-bit Arm doesn't support memory hotplug, and as far as I'm aware
    there's little likelihood of it ever wanting to. FWIW it looks like
    SuperH is the only pure-32-bit architecture to have hotplug support at
    all"

    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Acked-by: David Hildenbrand
    Acked-by: Baoquan He
    Cc: Wei Yang
    Cc: Naoya Horiguchi
    Cc: Oscar Salvador
    Cc: Robin Murphy
    Cc: Vamshi K Sthambamkadi
    Link: http://lkml.kernel.org/r/20200218100532.GA4151@dhcp22.suse.cz
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206401
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Currently, when adding memory, we create entries in /sys/firmware/memmap/
    as "System RAM". This will lead to kexec-tools to add that memory to the
    fixed-up initial memmap for a kexec kernel (loaded via kexec_load()). The
    memory will be considered initial System RAM by the kexec'd kernel and can
    no longer be reconfigured. This is not what happens during a real reboot.

    Let's add our memory via add_memory_driver_managed() now, so we won't
    create entries in /sys/firmware/memmap/ and indicate the memory as "System
    RAM (kmem)" in /proc/iomem. This allows everybody (especially
    kexec-tools) to identify that this memory is special and has to be treated
    differently than ordinary (hotplugged) System RAM.

    Before configuring the namespace:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-33fffffff : namespace0.0
    3280000000-32ffffffff : PCI Bus 0000:00

    After configuring the namespace:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    148200000-33fffffff : dax0.0
    3280000000-32ffffffff : PCI Bus 0000:00

    After loading kmem before this change:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : dax0.0
    150000000-33fffffff : System RAM
    3280000000-32ffffffff : PCI Bus 0000:00

    After loading kmem after this change:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : dax0.0
    150000000-33fffffff : System RAM (kmem)
    3280000000-32ffffffff : PCI Bus 0000:00

    After a proper reboot:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    148200000-33fffffff : dax0.0
    3280000000-32ffffffff : PCI Bus 0000:00

    Within the kexec kernel before this change:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : System RAM
    3280000000-32ffffffff : PCI Bus 0000:00

    Within the kexec kernel after this change:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    148200000-33fffffff : dax0.0
    3280000000-32ffffffff : PCI Bus 0000:00

    /sys/firmware/memmap/ before this change:
    0000000000000000-000000000009fc00 (System RAM)
    000000000009fc00-00000000000a0000 (Reserved)
    00000000000f0000-0000000000100000 (Reserved)
    0000000000100000-00000000bffdf000 (System RAM)
    00000000bffdf000-00000000c0000000 (Reserved)
    00000000feffc000-00000000ff000000 (Reserved)
    00000000fffc0000-0000000100000000 (Reserved)
    0000000100000000-0000000140000000 (System RAM)
    0000000150000000-0000000340000000 (System RAM)

    /sys/firmware/memmap/ after a proper reboot:
    0000000000000000-000000000009fc00 (System RAM)
    000000000009fc00-00000000000a0000 (Reserved)
    00000000000f0000-0000000000100000 (Reserved)
    0000000000100000-00000000bffdf000 (System RAM)
    00000000bffdf000-00000000c0000000 (Reserved)
    00000000feffc000-00000000ff000000 (Reserved)
    00000000fffc0000-0000000100000000 (Reserved)
    0000000100000000-0000000140000000 (System RAM)

    /sys/firmware/memmap/ after this change:
    0000000000000000-000000000009fc00 (System RAM)
    000000000009fc00-00000000000a0000 (Reserved)
    00000000000f0000-0000000000100000 (Reserved)
    0000000000100000-00000000bffdf000 (System RAM)
    00000000bffdf000-00000000c0000000 (Reserved)
    00000000feffc000-00000000ff000000 (Reserved)
    00000000fffc0000-0000000100000000 (Reserved)
    0000000100000000-0000000140000000 (System RAM)

    kexec-tools already seem to basically ignore any System RAM that's not on
    top level when searching for areas to place kexec images - but also for
    determining crash areas to dump via kdump. Changing the resource name
    won't have an impact.

    Handle unloading of the driver after memory hotremove failed properly, by
    duplicating the string if necessary.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Acked-by: Pankaj Gupta
    Cc: Michal Hocko
    Cc: Pankaj Gupta
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Dave Hansen
    Cc: Eric Biederman
    Cc: Pavel Tatashin
    Cc: Dan Williams
    Link: http://lkml.kernel.org/r/20200508084217.9160-5-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Memory flagged with IORESOURCE_MEM_DRIVER_MANAGED is special - it won't be
    part of the initial memmap of the kexec kernel and not all memory might be
    accessible. Don't place any kexec images onto it.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Pankaj Gupta
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Dave Hansen
    Cc: Eric Biederman
    Cc: Pavel Tatashin
    Cc: Dan Williams
    Link: http://lkml.kernel.org/r/20200508084217.9160-4-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Patch series "mm/memory_hotplug: Interface to add driver-managed system
    ram", v4.

    kexec (via kexec_load()) can currently not properly handle memory added
    via dax/kmem, and will have similar issues with virtio-mem. kexec-tools
    will currently add all memory to the fixed-up initial firmware memmap. In
    case of dax/kmem, this means that - in contrast to a proper reboot - how
    that persistent memory will be used can no longer be configured by the
    kexec'd kernel. In case of virtio-mem it will be harmful, because that
    memory might contain inaccessible pieces that require coordination with
    hypervisor first.

    In both cases, we want to let the driver in the kexec'd kernel handle
    detecting and adding the memory, like during an ordinary reboot.
    Introduce add_memory_driver_managed(). More on the samentics are in patch
    #1.

    In the future, we might want to make this behavior configurable for
    dax/kmem- either by configuring it in the kernel (which would then also
    allow to configure kexec_file_load()) or in kexec-tools by also adding
    "System RAM (kmem)" memory from /proc/iomem to the fixed-up initial
    firmware memmap.

    More on the motivation can be found in [1] and [2].

    [1] https://lkml.kernel.org/r/20200429160803.109056-1-david@redhat.com
    [2] https://lkml.kernel.org/r/20200430102908.10107-1-david@redhat.com

    This patch (of 3):

    Some device drivers rely on memory they managed to not get added to the
    initial (firmware) memmap as system RAM - so it's not used as initial
    system RAM by the kernel and the driver is under control. While this is
    the case during cold boot and after a reboot, kexec is not aware of that
    and might add such memory to the initial (firmware) memmap of the kexec
    kernel. We need ways to teach kernel and userspace that this system ram
    is different.

    For example, dax/kmem allows to decide at runtime if persistent memory is
    to be used as system ram. Another future user is virtio-mem, which has to
    coordinate with its hypervisor to deal with inaccessible parts within
    memory resources.

    We want to let users in the kernel (esp. kexec) but also user space
    (esp. kexec-tools) know that this memory has different semantics and
    needs to be handled differently:
    1. Don't create entries in /sys/firmware/memmap/
    2. Name the memory resource "System RAM ($DRIVER)" (exposed via
    /proc/iomem) ($DRIVER might be "kmem", "virtio_mem").
    3. Flag the memory resource IORESOURCE_MEM_DRIVER_MANAGED

    /sys/firmware/memmap/ [1] represents the "raw firmware-provided memory
    map" because "on most architectures that firmware-provided memory map is
    modified afterwards by the kernel itself". The primary user is kexec on
    x86-64. Since commit d96ae5309165 ("memory-hotplug: create
    /sys/firmware/memmap entry for new memory"), we add all hotplugged memory
    to that firmware memmap - which makes perfect sense for traditional memory
    hotplug on x86-64, where real HW will also add hotplugged DIMMs to the
    firmware memmap. We replicate what the "raw firmware-provided memory map"
    looks like after hot(un)plug.

    To keep things simple, let the user provide the full resource name instead
    of only the driver name - this way, we don't have to manually
    allocate/craft strings for memory resources. Also use the resource name
    to make decisions, to avoid passing additional flags. In case the name
    isn't "System RAM", it's special.

    We don't have to worry about firmware_map_remove() on the removal path.
    If there is no entry, it will simply return with -EINVAL.

    We'll adapt dax/kmem in a follow-up patch.

    [1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Acked-by: Pankaj Gupta
    Cc: Michal Hocko
    Cc: Pankaj Gupta
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Dave Hansen
    Cc: Eric Biederman
    Cc: Pavel Tatashin
    Cc: Dan Williams
    Link: http://lkml.kernel.org/r/20200508084217.9160-1-david@redhat.com
    Link: http://lkml.kernel.org/r/20200508084217.9160-3-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • The comment in add_memory_resource() is stale: hotadd_new_pgdat() will no
    longer call get_pfn_range_for_nid(), as a hotadded pgdat will simply span
    no pages at all, until memory is moved to the zone/node via
    move_pfn_range_to_zone() - e.g., when onlining memory blocks.

    The only archs that care about memblocks for hotplugged memory (either for
    iterating over all system RAM or testing for memory validity) are arm64,
    s390x, and powerpc - due to CONFIG_ARCH_KEEP_MEMBLOCK. Without
    CONFIG_ARCH_KEEP_MEMBLOCK, we can simply stop messing with memblocks.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Acked-by: Mike Rapoport
    Acked-by: Michal Hocko
    Cc: Michal Hocko
    Cc: Baoquan He
    Cc: Oscar Salvador
    Cc: Pankaj Gupta
    Cc: Mike Rapoport
    Cc: Anshuman Khandual
    Link: http://lkml.kernel.org/r/20200422155353.25381-3-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Patch series "mm/memory_hotplug: handle memblocks only with
    CONFIG_ARCH_KEEP_MEMBLOCK", v1.

    A hotadded node/pgdat will span no pages at all, until memory is moved to
    the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
    when onlining memory blocks. We don't have to initialize the
    node_start_pfn to the memory we are adding.

    This patch (of 2):

    Especially, there is an inconsistency:
    - Hotplugging memory to a memory-less node with cpus: node_start_pf == 0
    - Offlining and removing last memory from a node: node_start_pfn == 0
    - Hotplugging memory to a memory-less node without cpus: node_start_pfn != 0

    As soon as memory is onlined, node_start_pfn is overwritten with the
    actual start. E.g., when adding two DIMMs but only onlining one of both,
    only that DIMM (with online memory blocks) is spanned by the node.

    Currently, the validity of node_start_pfn really is linked to
    node_spanned_pages != 0. With node_spanned_pages == 0 (e.g., before
    onlining memory), it has no meaning.

    So let's stop setting node_start_pfn, just to be overwritten via
    move_pfn_range_to_zone(). This avoids confusion when looking at the code,
    wondering which magic will be performed with the node_start_pfn in this
    function, when hotadding a pgdat.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Acked-by: Pankaj Gupta
    Cc: Michal Hocko
    Cc: Baoquan He
    Cc: Oscar Salvador
    Cc: Pankaj Gupta
    Cc: Anshuman Khandual
    Cc: Mike Rapoport
    Cc: Michal Hocko
    Link: http://lkml.kernel.org/r/20200422155353.25381-1-david@redhat.com
    Link: http://lkml.kernel.org/r/20200422155353.25381-2-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Fortunately, all users of is_mem_section_removable() are gone. Get rid of
    it, including some now unnecessary functions.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Wei Yang
    Reviewed-by: Baoquan He
    Acked-by: Michal Hocko
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Oscar Salvador
    Link: http://lkml.kernel.org/r/20200407135416.24093-3-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory blocks
    as removable"), the user space interface to compute whether a memory block
    can be offlined (exposed via /sys/devices/system/memory/memoryX/removable)
    has effectively been deprecated. We want to remove the leftovers of the
    kernel implementation.

    When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
    we'll start by:
    1. Testing if it contains any holes, and reject if so
    2. Testing if pages belong to different zones, and reject if so
    3. Isolating the page range, checking if it contains any unmovable pages

    Using is_mem_section_removable() before trying to offline is not only
    racy, it can easily result in false positives/negatives. Let's stop
    manually checking is_mem_section_removable(), and let device_offline()
    handle it completely instead. We can remove the racy
    is_mem_section_removable() implementation next.

    We now take more locks (e.g., memory hotplug lock when offlining and the
    zone lock when isolating), but maybe we should optimize that
    implementation instead if this ever becomes a real problem (after all,
    memory unplug is already an expensive operation). We started using
    is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
    Implement memory hotplug remove in the kernel"), with the initial
    hotremove support of lmbs.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Cc: Nathan Fontenot
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michal Hocko
    Cc: Oscar Salvador
    Cc: Baoquan He
    Cc: Wei Yang
    Link: http://lkml.kernel.org/r/20200407135416.24093-2-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • A misbehaving qemu created a situation where the ACPI SRAT table
    advertised one fewer proximity domains than intended. The NFIT table did
    describe all the expected proximity domains. This caused the device dax
    driver to assign an impossible target_node to the device, and when
    hotplugged as system memory, this would fail with the following signature:

    BUG: kernel NULL pointer dereference, address: 0000000000000088
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 80000001767d4067 P4D 80000001767d4067 PUD 10e0c4067 PMD 0
    Oops: 0000 [#1] SMP PTI
    CPU: 4 PID: 22737 Comm: kswapd3 Tainted: G O 5.6.0-rc5 #9
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    RIP: 0010:prepare_kswapd_sleep+0x7c/0xc0
    Code: 89 df e8 87 fd ff ff 89 c2 31 c0 84 d2 74 e6 0f 1f 44 00 00 48 8b 05 fb af 7a 01 48 63 93 88 1d 01 00 48 8b 84 d0 20 0f 00 00 3b 98 88 00 00 00 75 28 f0 80 a0 80 00 00 00 fe f0 80 a3 38 20
    RSP: 0018:ffffc900017a3e78 EFLAGS: 00010202
    RAX: 0000000000000000 RBX: ffff8881209e0000 RCX: 0000000000000000
    RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff8881209e0e80
    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000008000
    R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000003
    R13: 0000000000000003 R14: 0000000000000000 R15: ffffc900017a3ec8
    FS: 0000000000000000(0000) GS:ffff888318c00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000088 CR3: 0000000120b50002 CR4: 00000000001606e0
    Call Trace:
    kswapd+0x103/0x520
    kthread+0x120/0x140
    ret_from_fork+0x3a/0x50

    Add a check in the add_memory path to fail if the node to which we are
    adding memory is in the node_possible_map

    Signed-off-by: Vishal Verma
    Signed-off-by: Andrew Morton
    Acked-by: David Hildenbrand
    Acked-by: Michal Hocko
    Cc: David Hildenbrand
    Cc: Dan Williams
    Cc: Dave Hansen
    Link: http://lkml.kernel.org/r/20200416225438.15208-1-vishal.l.verma@intel.com
    Signed-off-by: Linus Torvalds

    Vishal Verma
     
  • For kvmalloc'ed data object that contains sensitive information like
    cryptographic keys, we need to make sure that the buffer is always cleared
    before freeing it. Using memset() alone for buffer clearing may not
    provide certainty as the compiler may compile it away. To be sure, the
    special memzero_explicit() has to be used.

    This patch introduces a new kvfree_sensitive() for freeing those sensitive
    data objects allocated by kvmalloc(). The relevant places where
    kvfree_sensitive() can be used are modified to use it.

    Fixes: 4f0882491a14 ("KEYS: Avoid false positive ENOMEM error on key read")
    Suggested-by: Linus Torvalds
    Signed-off-by: Waiman Long
    Signed-off-by: Andrew Morton
    Reviewed-by: Eric Biggers
    Acked-by: David Howells
    Cc: Jarkko Sakkinen
    Cc: James Morris
    Cc: "Serge E. Hallyn"
    Cc: Joe Perches
    Cc: Matthew Wilcox
    Cc: David Rientjes
    Cc: Uladzislau Rezki
    Link: http://lkml.kernel.org/r/20200407200318.11711-1-longman@redhat.com
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • Most architectures define kmap_prot to be PAGE_KERNEL.

    Let sparc and xtensa define there own and define PAGE_KERNEL as the
    default if not overridden.

    [akpm@linux-foundation.org: coding style fixes]
    Suggested-by: Christoph Hellwig
    Signed-off-by: Ira Weiny
    Signed-off-by: Andrew Morton
    Cc: Al Viro
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Christian König
    Cc: Chris Zankel
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200507150004.1423069-16-ira.weiny@intel.com
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • linux/highmem.h has not been needed for the pte_offset_map => kmap_atomic
    use in sparc for some time (~2002)

    Remove this include.

    Suggested-by: Al Viro
    Signed-off-by: Ira Weiny
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Christian König
    Cc: Christoph Hellwig
    Cc: Chris Zankel
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200507150004.1423069-15-ira.weiny@intel.com
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • parisc reimplements the kmap calls except to flush its dcache. This is
    arguably an abuse of kmap but regardless it is messy and confusing.

    Remove the duplicate code and have parisc define ARCH_HAS_FLUSH_ON_KUNMAP
    for a kunmap_flush_on_unmap() architecture specific call to flush the
    cache.

    Suggested-by: Al Viro
    Signed-off-by: Ira Weiny
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Christian König
    Cc: Christoph Hellwig
    Cc: Chris Zankel
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200507150004.1423069-14-ira.weiny@intel.com
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • kmap_atomic_to_page() has no callers and is only defined on 1 arch and
    declared on another. Remove it.

    Suggested-by: Al Viro
    Signed-off-by: Ira Weiny
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Christian König
    Cc: Christoph Hellwig
    Cc: Chris Zankel
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200507150004.1423069-13-ira.weiny@intel.com
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • kmap_atomic_prot() is now exported by all architectures. Use this
    function rather than open coding a driver specific kmap_atomic.

    [arnd@arndb.de: include linux/highmem.h]
    Link: http://lkml.kernel.org/r/20200508220150.649044-1-arnd@arndb.de
    Signed-off-by: Ira Weiny
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Reviewed-by: Christian König
    Reviewed-by: Christoph Hellwig
    Acked-by: Daniel Vetter
    Cc: Al Viro
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Chris Zankel
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200507150004.1423069-12-ira.weiny@intel.com
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • To support kmap_atomic_prot(), all architectures need to support
    protections passed to their kmap_atomic_high() function. Pass protections
    into kmap_atomic_high() and change the name to kmap_atomic_high_prot() to
    match.

    Then define kmap_atomic_prot() as a core function which calls
    kmap_atomic_high_prot() when needed.

    Finally, redefine kmap_atomic() as a wrapper of kmap_atomic_prot() with
    the default kmap_prot exported by the architectures.

    Signed-off-by: Ira Weiny
    Signed-off-by: Andrew Morton
    Reviewed-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Christian König
    Cc: Chris Zankel
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200507150004.1423069-11-ira.weiny@intel.com
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • To support kmap_atomic_prot() on all architectures each arch must support
    protections passed in to them.

    Change csky, mips, nds32 and xtensa to use their global constant kmap_prot
    rather than a hard coded value which was equal.

    Signed-off-by: Ira Weiny
    Signed-off-by: Andrew Morton
    Reviewed-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Christian König
    Cc: Chris Zankel
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200507150004.1423069-10-ira.weiny@intel.com
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • We want to support kmap_atomic_prot() on all architectures and it makes
    sense to define kmap_atomic() to use the default kmap_prot.

    So we ensure all arch's have a globally available kmap_prot either as a
    define or exported symbol.

    Signed-off-by: Ira Weiny
    Signed-off-by: Andrew Morton
    Reviewed-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Christian König
    Cc: Chris Zankel
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200507150004.1423069-9-ira.weiny@intel.com
    Signed-off-by: Linus Torvalds

    Ira Weiny