18 Aug, 2018

1 commit

  • Most functions in memblock already use phys_addr_t to represent a
    physical address with __memblock_free_late() being an exception.

    This patch replaces u64 with phys_addr_t in __memblock_free_late() and
    switches several format strings from %llx to %pa to avoid casting from
    phys_addr_t to u64.

    Link: http://lkml.kernel.org/r/1530637506-1256-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Cc: Pasha Tatashin
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

15 Aug, 2018

1 commit

  • Pull documentation update from Jonathan Corbet:
    "This was a moderately busy cycle for docs, with the usual collection
    of small fixes and updates.

    We also have new ktime_get_*() docs from Arnd, some kernel-doc fixes,
    a new set of Italian translations (non so se vale la pena, ma non fa
    male - speriamo bene), and some extensive early memory-management
    documentation improvements from Mike Rapoport"

    * tag 'docs-4.19' of git://git.lwn.net/linux: (52 commits)
    Documentation: corrections to console/console.txt
    Documentation: add ioctl number entry for v4l2-subdev.h
    Remove gendered language from management style documentation
    scripts/kernel-doc: Escape all literal braces in regexes
    docs/mm: add description of boot time memory management
    docs/mm: memblock: add overview documentation
    docs/mm: memblock: add kernel-doc description for memblock types
    docs/mm: memblock: add kernel-doc comments for memblock_add[_node]
    docs/mm: memblock: update kernel-doc comments
    mm/memblock: add a name for memblock flags enumeration
    docs/mm: bootmem: add overview documentation
    docs/mm: bootmem: add kernel-doc description of 'struct bootmem_data'
    docs/mm: bootmem: fix kernel-doc warnings
    docs/mm: nobootmem: fixup kernel-doc comments
    mm/bootmem: drop duplicated kernel-doc comments
    Documentation: vm.txt: Adding 'nr_hugepages_mempolicy' parameter description.
    doc:it_IT: translation for kernel-hacking
    docs: Fix the reference labels in Locking.rst
    doc: tracing: Fix a typo of trace_stat
    mm: Introduce new type vm_fault_t
    ...

    Linus Torvalds
     

03 Aug, 2018

4 commits


22 Jul, 2018

1 commit

  • Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis")
    introduced two new function definitions:

    memblock_virt_alloc_try_nid_nopanic()
    memblock_virt_alloc_try_nid()

    and commit ea1f5f3712af ("mm: define memblock_virt_alloc_try_nid_raw")
    introduced the following function definition:

    memblock_virt_alloc_try_nid_raw()

    This commit adds an include of header file to provide
    the missing function prototypes. This silences the following gcc warning
    (W=1):

    mm/memblock.c:1334:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_raw' [-Wmissing-prototypes]
    mm/memblock.c:1371:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_nopanic' [-Wmissing-prototypes]
    mm/memblock.c:1407:15: warning: no previous prototype for `memblock_virt_alloc_try_nid' [-Wmissing-prototypes]

    Also adds #ifdef blockers to prevent compilation failure on mips/ia64
    where CONFIG_NO_BOOTMEM=n as could be seen in commit commit 6cc22dc08a24
    ("revert "mm/memblock: add missing include "").

    Because Makefile already does:

    obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o

    The #ifdef has been simplified from:

    #if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)

    to simply:

    #if defined(CONFIG_NO_BOOTMEM)

    Link: http://lkml.kernel.org/r/20180626184422.24974-1-malat@debian.org
    Signed-off-by: Mathieu Malaterre
    Suggested-by: Tony Luck
    Suggested-by: Michal Hocko
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Malaterre
     

15 Jul, 2018

1 commit

  • Mike Rapoport is converting architectures from bootmem to nobootmem
    allocator. While doing so for m68k Geert has noticed that he gets a
    scary looking warning:

    WARNING: CPU: 0 PID: 0 at mm/memblock.c:230
    memblock_find_in_range_node+0x11c/0x1be
    memblock: bottom-up allocation failed, memory hotunplug may be affected
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted
    4.18.0-rc3-atari-01343-gf2fb5f2e09a97a3c-dirty #7
    Call Trace: __warn+0xa8/0xc2
    kernel_pg_dir+0x0/0x1000
    netdev_lower_get_next+0x2/0x22
    warn_slowpath_fmt+0x2e/0x36
    memblock_find_in_range_node+0x11c/0x1be
    memblock_find_in_range_node+0x11c/0x1be
    memblock_find_in_range_node+0x0/0x1be
    vprintk_func+0x66/0x6e
    memblock_virt_alloc_internal+0xd0/0x156
    netdev_lower_get_next+0x2/0x22
    netdev_lower_get_next+0x2/0x22
    kernel_pg_dir+0x0/0x1000
    memblock_virt_alloc_try_nid_nopanic+0x58/0x7a
    netdev_lower_get_next+0x2/0x22
    kernel_pg_dir+0x0/0x1000
    kernel_pg_dir+0x0/0x1000
    EXPTBL+0x234/0x400
    EXPTBL+0x234/0x400
    alloc_node_mem_map+0x4a/0x66
    netdev_lower_get_next+0x2/0x22
    free_area_init_node+0xe2/0x29e
    EXPTBL+0x234/0x400
    paging_init+0x430/0x462
    kernel_pg_dir+0x0/0x1000
    printk+0x0/0x1a
    EXPTBL+0x234/0x400
    setup_arch+0x1b8/0x22c
    start_kernel+0x4a/0x40a
    _sinittext+0x344/0x9e8

    The warning is basically saying that a top-down allocation can break
    memory hotremove because memblock allocation is not movable. But m68k
    doesn't even support MEMORY_HOTREMOVE so there is no point to warn about
    it.

    Make the warning conditional only to configurations that care.

    Link: http://lkml.kernel.org/r/20180706061750.GH32658@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Reported-by: Geert Uytterhoeven
    Tested-by: Geert Uytterhoeven
    Reviewed-by: Andrew Morton
    Cc: Vlastimil Babka
    Cc: Mike Rapoport
    Cc: Greg Ungerer
    Cc: Sam Creasey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

19 Jun, 2018

1 commit

  • The patch fixed a W=1 warning but broke the ia64 build:

    CC mm/memblock.o
    mm/memblock.c:1340: error: redefinition of `memblock_virt_alloc_try_nid_raw'
    ./include/linux/bootmem.h:335: error: previous definition of `memblock_virt_alloc_try_nid_raw' was here

    Because inlcude/linux/bootmem.h says

    #if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)

    whereas mm/Makefile says

    obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o

    So revert 26f09e9b3a06 ("mm/memblock: add missing include
    ") while a full fix can be worked on.

    Fixes: 26f09e9b3a06 ("mm/memblock: add missing include ")
    Reported-by: Tony Luck
    Cc: Mathieu Malaterre
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

15 Jun, 2018

2 commits

  • mm/*.c files use symbolic and octal styles for permissions.

    Using octal and not symbolic permissions is preferred by many as more
    readable.

    https://lkml.org/lkml/2016/8/2/1945

    Prefer the direct use of octal for permissions.

    Done using
    $ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
    and some typing.

    Before: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    44
    After: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    86

    Miscellanea:

    o Whitespace neatening around these conversions.

    Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
    Signed-off-by: Joe Perches
    Acked-by: David Rientjes
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis")
    introduced two new function definitions:

    memblock_virt_alloc_try_nid_nopanic()
    memblock_virt_alloc_try_nid()

    Commit ea1f5f3712af ("mm: define memblock_virt_alloc_try_nid_raw")
    introduced the following function definition:

    memblock_virt_alloc_try_nid_raw()

    This commit adds an includeof header file to provide
    the missing function prototypes. Silence the following gcc warning
    (W=1):

    mm/memblock.c:1334:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_raw' [-Wmissing-prototypes]
    mm/memblock.c:1371:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_nopanic' [-Wmissing-prototypes]
    mm/memblock.c:1407:15: warning: no previous prototype for `memblock_virt_alloc_try_nid' [-Wmissing-prototypes]

    Link: http://lkml.kernel.org/r/20180606194144.16990-1-malat@debian.org
    Signed-off-by: Mathieu Malaterre
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Malaterre
     

08 Jun, 2018

2 commits

  • memblock_remove report is useful to see why MemTotal of /proc/meminfo
    between two kernels makes difference.

    Link: http://lkml.kernel.org/r/20180508104223.8028-1-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Andrew Morton
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • So far code was using ULLONG_MAX and type casting to obtain a
    phys_addr_t with all bits set. The typecast is necessary to silence
    compiler warnings on 32-bit platforms.

    Use the simpler but still type safe approach "~(phys_addr_t)0" to create a
    preprocessor define for all bits set.

    Link: http://lkml.kernel.org/r/20180406213809.566-1-stefan@agner.ch
    Signed-off-by: Stefan Agner
    Suggested-by: Linus Torvalds
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Catalin Marinas
    Cc: Pavel Tatashin
    Cc: Ard Biesheuvel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stefan Agner
     

08 Apr, 2018

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "Notable changes:

    - Support for 4PB user address space on 64-bit, opt-in via mmap().

    - Removal of POWER4 support, which was accidentally broken in 2016
    and no one noticed, and blocked use of some modern instructions.

    - Workarounds so that the hypervisor can enable Transactional Memory
    on Power9.

    - A series to disable the DAWR (Data Address Watchpoint Register) on
    Power9.

    - More information displayed in the meltdown/spectre_v1/v2 sysfs
    files.

    - A vpermxor (Power8 Altivec) implementation for the raid6 Q
    Syndrome.

    - A big series to make the allocation of our pacas (per cpu area),
    kernel page tables, and per-cpu stacks NUMA aware when using the
    Radix MMU on Power9.

    And as usual many fixes, reworks and cleanups.

    Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy,
    Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual,
    Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
    Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic
    Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook,
    Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe,
    Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring,
    Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira,
    Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras,
    Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher
    Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev
    Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav
    Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun"

    * tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits)
    powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
    powerpc/64s: Fix POWER9 DD2.2 and above in cputable features
    powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit
    powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits
    Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"
    powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}
    powerpc: io.h: move iomap.h include so that it can use readq/writeq defs
    cxl: Fix possible deadlock when processing page faults from cxllib
    powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it
    powerpc/mm/radix: Update command line parsing for disable_radix
    powerpc/mm/radix: Parse disable_radix commandline correctly.
    powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb
    powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix
    powerpc/mm/keys: Update documentation and remove unnecessary check
    powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead
    powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()
    powerpc/powernv: Always stop secondaries before reboot/shutdown
    powerpc: hard disable irqs in smp_send_stop loop
    powerpc: use NMI IPI for smp_send_stop
    powerpc/powernv: Fix SMT4 forcing idle code
    ...

    Linus Torvalds
     

06 Apr, 2018

5 commits

  • This fixes a warning shown when phys_addr_t is 32-bit int when compiling
    with clang:

    mm/memblock.c:927:15: warning: implicit conversion from 'unsigned long long'
    to 'phys_addr_t' (aka 'unsigned int') changes value from
    18446744073709551615 to 4294967295 [-Wconstant-conversion]
    r->base : ULLONG_MAX;
    ^~~~~~~~~~
    ./include/linux/kernel.h:30:21: note: expanded from macro 'ULLONG_MAX'
    #define ULLONG_MAX (~0ULL)
    ^~~~~

    Link: http://lkml.kernel.org/r/20180319005645.29051-1-stefan@agner.ch
    Signed-off-by: Stefan Agner
    Reviewed-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Catalin Marinas
    Cc: Pavel Tatashin
    Cc: Ard Biesheuvel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stefan Agner
     
  • Currently #includes for no obvious
    reason. It looks like it's only a convenience, so remove kmemleak.h
    from slab.h and add to any users of kmemleak_* that
    don't already #include it. Also remove from source
    files that do not use it.

    This is tested on i386 allmodconfig and x86_64 allmodconfig. It would
    be good to run it through the 0day bot for other $ARCHes. I have
    neither the horsepower nor the storage space for the other $ARCHes.

    Update: This patch has been extensively build-tested by both the 0day
    bot & kisskb/ozlabs build farms. Both of them reported 2 build failures
    for which patches are included here (in v2).

    [ slab.h is the second most used header file after module.h; kernel.h is
    right there with slab.h. There could be some minor error in the
    counting due to some #includes having comments after them and I didn't
    combine all of those. ]

    [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
    Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
    Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
    Signed-off-by: Randy Dunlap
    Reviewed-by: Ingo Molnar
    Reported-by: Michael Ellerman [2 build failures]
    Reported-by: Fengguang Wu [2 build failures]
    Reviewed-by: Andrew Morton
    Cc: Wei Yongjun
    Cc: Luis R. Rodriguez
    Cc: Greg Kroah-Hartman
    Cc: Mimi Zohar
    Cc: John Johansen
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • ...instead of open coding file operations followed by custom ->open()
    callbacks per each attribute.

    [andriy.shevchenko@linux.intel.com: add tags, fix compilation issue]
    Link: http://lkml.kernel.org/r/20180217144253.58604-1-andriy.shevchenko@linux.intel.com
    Link: http://lkml.kernel.org/r/20180214154644.54505-1-andriy.shevchenko@linux.intel.com
    Signed-off-by: Andy Shevchenko
    Reviewed-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Christoph Lameter
    Cc: Tejun Heo
    Cc: Dennis Zhou
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • During boot we poison struct page memory in order to ensure that no one
    is accessing this memory until the struct pages are initialized in
    __init_single_page().

    This patch adds more scrutiny to this checking by making sure that flags
    do not equal the poison pattern when they are accessed. The pattern is
    all ones.

    Since node id is also stored in struct page, and may be accessed quite
    early, we add this enforcement into page_to_nid() function as well.
    Note, this is applicable only when NODE_NOT_IN_PAGE_FLAGS=n

    [pasha.tatashin@oracle.com: v4]
    Link: http://lkml.kernel.org/r/20180215165920.8570-4-pasha.tatashin@oracle.com
    Link: http://lkml.kernel.org/r/20180213193159.14606-4-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Ingo Molnar
    Acked-by: Michal Hocko
    Cc: Baoquan He
    Cc: Bharata B Rao
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Kirill A. Shutemov
    Cc: Mel Gorman
    Cc: Steven Sistare
    Cc: Thomas Gleixner
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • Deferred page initialization allows the boot cpu to initialize a small
    subset of the system's pages early in boot, with other cpus doing the
    rest later on.

    It is, however, problematic to know how many pages the kernel needs
    during boot. Different modules and kernel parameters may change the
    requirement, so the boot cpu either initializes too many pages or runs
    out of memory.

    To fix that, initialize early pages on demand. This ensures the kernel
    does the minimum amount of work to initialize pages during boot and
    leaves the rest to be divided in the multithreaded initialization path
    (deferred_init_memmap).

    The on-demand code is permanently disabled using static branching once
    deferred pages are initialized. After the static branch is changed to
    false, the overhead is up-to two branch-always instructions if the zone
    watermark check fails or if rmqueue fails.

    Sergey Senozhatsky noticed that while deferred pages currently make
    sense only on NUMA machines (we start one thread per latency node),
    CONFIG_NUMA is not a requirement for CONFIG_DEFERRED_STRUCT_PAGE_INIT,
    so that is also must be addressed in the patch.

    [akpm@linux-foundation.org: fix typo in comment, make deferred_pages static]
    [pasha.tatashin@oracle.com: fix min() type mismatch warning]
    Link: http://lkml.kernel.org/r/20180212164543.26592-1-pasha.tatashin@oracle.com
    [pasha.tatashin@oracle.com: use zone_to_nid() in deferred_grow_zone()]
    Link: http://lkml.kernel.org/r/20180214163343.21234-2-pasha.tatashin@oracle.com
    [pasha.tatashin@oracle.com: might_sleep warning]
    Link: http://lkml.kernel.org/r/20180306192022.28289-1-pasha.tatashin@oracle.com
    [akpm@linux-foundation.org: s/spin_lock/spin_lock_irq/ in page_alloc_init_late()]
    [pasha.tatashin@oracle.com: v5]
    Link: http://lkml.kernel.org/r/20180309220807.24961-3-pasha.tatashin@oracle.com
    [akpm@linux-foundation.org: tweak comments]
    [pasha.tatashin@oracle.com: v6]
    Link: http://lkml.kernel.org/r/20180313182355.17669-3-pasha.tatashin@oracle.com
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20180209192216.20509-2-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Daniel Jordan
    Reviewed-by: Steven Sistare
    Reviewed-by: Andrew Morton
    Tested-by: Masayoshi Mizuma
    Acked-by: Mel Gorman
    Cc: Michal Hocko
    Cc: Catalin Marinas
    Cc: AKASHI Takahiro
    Cc: Gioh Kim
    Cc: Heiko Carstens
    Cc: Yaowei Bai
    Cc: Wei Yang
    Cc: Paul Burton
    Cc: Miles Chen
    Cc: Vlastimil Babka
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

30 Mar, 2018

1 commit


23 Mar, 2018

1 commit

  • This reverts commit b92df1de5d28 ("mm: page_alloc: skip over regions of
    invalid pfns where possible"). The commit is meant to be a boot init
    speed up skipping the loop in memmap_init_zone() for invalid pfns.

    But given some specific memory mapping on x86_64 (or more generally
    theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
    implementation also skips valid pfns which is plain wrong and causes
    'kernel BUG at mm/page_alloc.c:1389!'

    crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
    kernel BUG at mm/page_alloc.c:1389!
    invalid opcode: 0000 [#1] SMP
    --
    RIP: 0010: move_freepages+0x15e/0x160
    --
    Call Trace:
    move_freepages_block+0x73/0x80
    __rmqueue+0x263/0x460
    get_page_from_freelist+0x7e1/0x9e0
    __alloc_pages_nodemask+0x176/0x420
    --

    crash> page_init_bug -v | grep RAM
    1000 - 9bfff System RAM (620.00 KiB)
    100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
    4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB)
    4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
    7b788000 - 7b7fffff System RAM (480.00 KiB)
    100000000 - 67fffffff System RAM ( 22.00 GiB)

    crash> page_init_bug | head -6
    7b788000 - 7b7fffff System RAM (480.00 KiB)
    1fffff00000000 0 1 DMA32 4096 1048575
    505736 505344 505855
    0 0 0 DMA 1 4095
    1fffff00000400 0 1 DMA32 4096 1048575
    BUG, zones differ!

    crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
    PAGE PHYSICAL MAPPING INDEX CNT FLAGS
    ffffea0001e00000 78000000 0 0 0 0
    ffffea0001ed7fc0 7b5ff000 0 0 0 0
    ffffea0001ed8000 7b600000 0 0 0 0 <<<<
    ffffea0001ede1c0 7b787000 0 0 0 0
    ffffea0001ede200 7b788000 0 0 1 1fffff00000000

    Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
    Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
    Signed-off-by: Daniel Vacek
    Acked-by: Ard Biesheuvel
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Pavel Tatashin
    Cc: Paul Burton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Vacek
     

10 Mar, 2018

1 commit

  • This is just a cleanup. It aids handling the special end case in the
    next commit.

    [akpm@linux-foundation.org: make it work against current -linus, not against -mm]
    [akpm@linux-foundation.org: make it work against current -linus, not against -mm some more]
    Link: http://lkml.kernel.org/r/1ca478d4269125a99bcfb1ca04d7b88ac1aee924.1520011944.git.neelx@redhat.com
    Signed-off-by: Daniel Vacek
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Pavel Tatashin
    Cc: Paul Burton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Vacek
     

07 Feb, 2018

1 commit

  • Make memblock_is_map/region_memory return bool due to these two
    functions only using either true or false as its return value.

    No functional change.

    Link: http://lkml.kernel.org/r/1513266622-15860-2-git-send-email-baiyaowei@cmss.chinamobile.com
    Signed-off-by: Yaowei Bai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yaowei Bai
     

16 Nov, 2017

2 commits

  • * A new variant of memblock_virt_alloc_* allocations:
    memblock_virt_alloc_try_nid_raw()
    - Does not zero the allocated memory
    - Does not panic if request cannot be satisfied

    * optimize early system hash allocations

    Clients can call alloc_large_system_hash() with flag: HASH_ZERO to
    specify that memory that was allocated for system hash needs to be
    zeroed, otherwise the memory does not need to be zeroed, and client will
    initialize it.

    If memory does not need to be zero'd, call the new
    memblock_virt_alloc_raw() interface, and thus improve the boot
    performance.

    * debug for raw alloctor

    When CONFIG_DEBUG_VM is enabled, this patch sets all the memory that is
    returned by memblock_virt_alloc_try_nid_raw() to ones to ensure that no
    places excpect zeroed memory.

    Link: http://lkml.kernel.org/r/20171013173214.27300-6-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Steven Sistare
    Reviewed-by: Daniel Jordan
    Reviewed-by: Bob Picco
    Tested-by: Bob Picco
    Acked-by: Michal Hocko
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Ard Biesheuvel
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: David S. Miller
    Cc: Dmitry Vyukov
    Cc: Heiko Carstens
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Mark Rutland
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Sam Ravnborg
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • for_each_memblock_type macro function relies on idx variable defined in
    the caller context. Silent macro arguments are almost always wrong
    thing to do. They make code harder to read and easier to get wrong.
    Let's use an explicit iterator parameter for for_each_memblock_type and
    make the code more obious. This patch is a mere cleanup and it
    shouldn't introduce any functional change.

    Link: http://lkml.kernel.org/r/20170913133029.28911-1-gi-oh.kim@profitbricks.com
    Signed-off-by: Gioh Kim
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gioh Kim
     

26 Aug, 2017

1 commit

  • In recently introduced memblock_discard() there is a reversed logic bug.
    Memory is freed of static array instead of dynamically allocated one.

    Link: http://lkml.kernel.org/r/1503511441-95478-2-git-send-email-pasha.tatashin@oracle.com
    Fixes: 3010f876500f ("mm: discard memblock data later")
    Signed-off-by: Pavel Tatashin
    Reported-by: Woody Suwalski
    Tested-by: Woody Suwalski
    Acked-by: Michal Hocko
    Cc: Vlastimil Babka
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

19 Aug, 2017

1 commit

  • There is existing use after free bug when deferred struct pages are
    enabled:

    The memblock_add() allocates memory for the memory array if more than
    128 entries are needed. See comment in e820__memblock_setup():

    * The bootstrap memblock region count maximum is 128 entries
    * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
    * than that - so allow memblock resizing.

    This memblock memory is freed here:
    free_low_memory_core_early()

    We access the freed memblock.memory later in boot when deferred pages
    are initialized in this path:

    deferred_init_memmap()
    for_each_mem_pfn_range()
    __next_mem_pfn_range()
    type = &memblock.memory;

    One possible explanation for why this use-after-free hasn't been hit
    before is that the limit of INIT_MEMBLOCK_REGIONS has never been
    exceeded at least on systems where deferred struct pages were enabled.

    Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
    and verifying in qemu that this code is getting excuted and that the
    freed pages are sane.

    Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
    Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Steven Sistare
    Reviewed-by: Daniel Jordan
    Reviewed-by: Bob Picco
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

07 Jul, 2017

2 commits

  • movable_node_is_enabled is defined in memblock proper while it is
    initialized from the memory hotplug proper. This is quite messy and it
    makes a dependency between the two so move movable_node along with the
    helper functions to memory_hotplug.

    To make it more entertaining the kernel parameter is ignored unless
    CONFIG_HAVE_MEMBLOCK_NODE_MAP=y because we do not have the node
    information for each memblock otherwise. So let's warn when the option
    is disabled.

    Link: http://lkml.kernel.org/r/20170529114141.536-4-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Reza Arbab
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Jerome Glisse
    Cc: Yasuaki Ishimatsu
    Cc: Xishi Qiu
    Cc: Kani Toshimitsu
    Cc: Chen Yucong
    Cc: Joonsoo Kim
    Cc: Andi Kleen
    Cc: David Rientjes
    Cc: Daniel Kiper
    Cc: Igor Mammedov
    Cc: Vitaly Kuznetsov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Commit 20b2f52b73fe ("numa: add CONFIG_MOVABLE_NODE for
    movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
    good explanation on why it is actually useful.

    It makes a lot of sense to make movable node semantic opt in but we
    already have that because the feature has to be explicitly enabled on
    the kernel command line. A config option on top only makes the
    configuration space larger without a good reason. It also adds an
    additional ifdefery that pollutes the code.

    Just drop the config option and make it de-facto always enabled. This
    shouldn't introduce any change to the semantic.

    Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Reza Arbab
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Jerome Glisse
    Cc: Yasuaki Ishimatsu
    Cc: Xishi Qiu
    Cc: Kani Toshimitsu
    Cc: Chen Yucong
    Cc: Joonsoo Kim
    Cc: Andi Kleen
    Cc: David Rientjes
    Cc: Daniel Kiper
    Cc: Igor Mammedov
    Cc: Vitaly Kuznetsov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

03 Jun, 2017

1 commit

  • We have seen an early OOM killer invocation on ppc64 systems with
    crashkernel=4096M:

    kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
    kthreadd cpuset=/ mems_allowed=7
    CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
    Call Trace:
    dump_stack+0xb0/0xf0 (unreliable)
    dump_header+0xb0/0x258
    out_of_memory+0x5f0/0x640
    __alloc_pages_nodemask+0xa8c/0xc80
    kmem_getpages+0x84/0x1a0
    fallback_alloc+0x2a4/0x320
    kmem_cache_alloc_node+0xc0/0x2e0
    copy_process.isra.25+0x260/0x1b30
    _do_fork+0x94/0x470
    kernel_thread+0x48/0x60
    kthreadd+0x264/0x330
    ret_from_kernel_thread+0x5c/0xa4

    Mem-Info:
    active_anon:0 inactive_anon:0 isolated_anon:0
    active_file:0 inactive_file:0 isolated_file:0
    unevictable:0 dirty:0 writeback:0 unstable:0
    slab_reclaimable:5 slab_unreclaimable:73
    mapped:0 shmem:0 pagetables:0 bounce:0
    free:0 free_pcp:0 free_cma:0
    Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
    lowmem_reserve[]: 0 0 0 0
    Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
    0 total pagecache pages
    0 pages in swap cache
    Swap cache stats: add 0, delete 0, find 0/0
    Free swap = 0kB
    Total swap = 0kB
    819200 pages RAM
    0 pages HighMem/MovableOnly
    817481 pages reserved
    0 pages cma reserved
    0 pages hwpoisoned

    the reason is that the managed memory is too low (only 110MB) while the
    rest of the the 50GB is still waiting for the deferred intialization to
    be done. update_defer_init estimates the initial memoty to initialize
    to 2GB at least but it doesn't consider any memory allocated in that
    range. In this particular case we've had

    Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

    so the low 2GB is mostly depleted.

    Fix this by considering memblock allocations in the initial static
    initialization estimation. Move the max_initialise to
    reset_deferred_meminit and implement a simple memblock_reserved_memory
    helper which iterates all reserved blocks and sums the size of all that
    start below the given address. The cumulative size is than added on top
    of the initial estimation. This is still not ideal because
    reset_deferred_meminit doesn't consider holes and so reservation might
    be above the initial estimation whihch we ignore but let's make the
    logic simpler until we really need to handle more complicated cases.

    Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
    Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Acked-by: Mel Gorman
    Tested-by: Srikar Dronamraju
    Cc: [4.2+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

06 Apr, 2017

2 commits

  • Add memblock_cap_memory_range() which will remove all the memblock regions
    except the memory range specified in the arguments. In addition, rework is
    done on memblock_mem_limit_remove_map() to re-implement it using
    memblock_cap_memory_range().

    This function, like memblock_mem_limit_remove_map(), will not remove
    memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
    later as "device memory."
    See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
    address the mem limit issue").

    This function is used, in a succeeding patch in the series of arm64 kdump
    suuport, to limit the range of usable memory, or System RAM, on crash dump
    kernel.
    (Please note that "mem=" parameter is of little use for this purpose.)

    Signed-off-by: AKASHI Takahiro
    Reviewed-by: Will Deacon
    Acked-by: Catalin Marinas
    Acked-by: Dennis Chen
    Cc: linux-mm@kvack.org
    Cc: Andrew Morton
    Reviewed-by: Ard Biesheuvel
    Signed-off-by: Catalin Marinas

    AKASHI Takahiro
     
  • This function, with a combination of memblock_mark_nomap(), will be used
    in a later kdump patch for arm64 when it temporarily isolates some range
    of memory from the other memory blocks in order to create a specific
    kernel mapping at boot time.

    Signed-off-by: AKASHI Takahiro
    Reviewed-by: Ard Biesheuvel
    Signed-off-by: Catalin Marinas

    AKASHI Takahiro
     

10 Mar, 2017

1 commit

  • Obviously, we should not access memblock.memory.regions[right] if
    'right' is outside of [0..memblock.memory.cnt>.

    Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
    Link: http://lkml.kernel.org/r/20170303023745.9104-1-takahiro.akashi@linaro.org
    Signed-off-by: AKASHI Takahiro
    Cc: Paul Burton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    AKASHI Takahiro
     

25 Feb, 2017

3 commits

  • Provide the name of each memblock type with struct memblock_type. This
    allows to get rid of the function memblock_type_name() and duplicating
    the type names in __memblock_dump_all().

    The only memblock_type usage out of mm/memblock.c seems to be
    arch/s390/kernel/crash_dump.c. While at it, give it a name.

    Link: http://lkml.kernel.org/r/20170120123456.46508-4-heiko.carstens@de.ibm.com
    Signed-off-by: Heiko Carstens
    Cc: Philipp Hachtmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the
    memblock structure knows about a physical memory list.

    The physical memory list should also be dumped if memblock_dump_all() is
    called in case memblock_debug is switched on. This makes debugging a
    bit easier.

    Link: http://lkml.kernel.org/r/20170120123456.46508-3-heiko.carstens@de.ibm.com
    Signed-off-by: Heiko Carstens
    Cc: Philipp Hachtmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the
    memblock structure knows about a physical memory list.

    memblock_type_name() should return "physmem" instead of "unknown" if the
    name of the physmem memblock_type is being asked for.

    Link: http://lkml.kernel.org/r/20170120123456.46508-2-heiko.carstens@de.ibm.com
    Signed-off-by: Heiko Carstens
    Cc: Philipp Hachtmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

23 Feb, 2017

4 commits

  • There is no variable named flags in memblock_add() and
    memblock_reserve() so remove it from the log messages.

    This patch also cleans up the type casting for phys_addr_t by using %pa
    to print them.

    Link: http://lkml.kernel.org/r/1484720165-25403-1-git-send-email-miles.chen@mediatek.com
    Signed-off-by: Miles Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miles Chen
     
  • memblock_reserve() would add a new range to memblock.reserved in case
    the new range is not totally covered by any of the current
    memblock.reserved range. If the memblock.reserved is full and can't
    resize, memblock_reserve() would fail.

    This doesn't happen in real world now, I observed this during code
    review. While theoretically, it has the chance to happen. And if it
    happens, others would think this range of memory is still available and
    may corrupt the memory.

    This patch checks the return value and goto "done" after it succeeds.

    Link: http://lkml.kernel.org/r/1482363033-24754-3-git-send-email-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • memblock_is_region_memory() invoke memblock_search() to see whether the
    base address is in the memory region. If it fails, idx would be -1.
    Then, it returns 0.

    If the memblock_search() returns a valid index, it means the base
    address is guaranteed to be in the range memblock.memory.regions[idx].
    Because of this, it is not necessary to check the base again.

    This patch removes the check on "base".

    Link: http://lkml.kernel.org/r/1482363033-24754-2-git-send-email-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • When using a sparse memory model memmap_init_zone() when invoked with
    the MEMMAP_EARLY context will skip over pages which aren't valid - ie.
    which aren't in a populated region of the sparse memory map. However if
    the memory map is extremely sparse then it can spend a long time
    linearly checking each PFN in a large non-populated region of the memory
    map & skipping it in turn.

    When CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled, we have sufficient
    information to quickly discover the next valid PFN given an invalid one
    by searching through the list of memory regions & skipping forwards to
    the first PFN covered by the memory region to the right of the
    non-populated region. Implement this in order to speed up
    memmap_init_zone() for systems with extremely sparse memory maps.

    James said "I have tested this patch on a virtual model of a Samurai CPU
    with a sparse memory map. The kernel boot time drops from 109 to
    62 seconds. "

    Link: http://lkml.kernel.org/r/20161125185518.29885-1-paul.burton@imgtec.com
    Signed-off-by: Paul Burton
    Tested-by: James Hartley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Burton