18 Mar, 2016

1 commit

  • Most of the mm subsystem uses pr_ so make it consistent.

    Miscellanea:

    - Realign arguments
    - Add missing newline to format
    - kmemleak-test.c has a "kmemleak: " prefix added to the
    "Kmemleak testing" logging message via pr_fmt

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

01 Jul, 2015

2 commits

  • mminit_verify_page_links() is an extremely paranoid check that was
    introduced when memory initialisation was being heavily reworked.
    Profiles indicated that up to 10% of parallel memory initialisation was
    spent on checking this for every page. The cost could be reduced but in
    practice this check only found problems very early during the
    initialisation rewrite and has found nothing since. This patch removes an
    expensive unnecessary check.

    Signed-off-by: Mel Gorman
    Tested-by: Nate Zimmer
    Tested-by: Waiman Long
    Tested-by: Daniel J Blueman
    Acked-by: Pekka Enberg
    Cc: Robin Holt
    Cc: Nate Zimmer
    Cc: Dave Hansen
    Cc: Waiman Long
    Cc: Scott Norton
    Cc: "Luck, Tony"
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Only a subset of struct pages are initialised at the moment. When this
    patch is applied kswapd initialise the remaining struct pages in parallel.

    This should boot faster by spreading the work to multiple CPUs and
    initialising data that is local to the CPU. The user-visible effect on
    large machines is that free memory will appear to rapidly increase early
    in the lifetime of the system until kswapd reports that all memory is
    initialised in the kernel log. Once initialised there should be no other
    user-visibile effects.

    Signed-off-by: Mel Gorman
    Tested-by: Nate Zimmer
    Tested-by: Waiman Long
    Tested-by: Daniel J Blueman
    Acked-by: Pekka Enberg
    Cc: Robin Holt
    Cc: Nate Zimmer
    Cc: Dave Hansen
    Cc: Waiman Long
    Cc: Scott Norton
    Cc: "Luck, Tony"
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

13 Feb, 2015

2 commits

  • mminit_loglevel is only referenced from __init and __meminit functions, so
    we can mark it __meminitdata.

    Signed-off-by: Rasmus Villemoes
    Cc: Vlastimil Babka
    Cc: Rik van Riel
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Vishnu Pratap Singh
    Cc: Pintu Kumar
    Cc: Michal Nazarewicz
    Cc: Mel Gorman
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Tim Chen
    Cc: Hugh Dickins
    Cc: Li Zefan
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • The only caller of mminit_verify_zonelist is build_all_zonelists_init,
    which is annotated with __init, so it should be safe to also mark the
    former as __init, saving ~400 bytes of .text.

    Signed-off-by: Rasmus Villemoes
    Cc: Vlastimil Babka
    Cc: Rik van Riel
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Vishnu Pratap Singh
    Cc: Pintu Kumar
    Cc: Michal Nazarewicz
    Cc: Mel Gorman
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Tim Chen
    Cc: Hugh Dickins
    Cc: Li Zefan
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

28 Jan, 2014

1 commit

  • Commit da29bd36224b ("mm/mm_init.c: make creation of the mm_kobj happen
    earlier than device_initcall") changed to pure_initcall(mm_sysfs_init).

    That's too early: mm_sysfs_init() depends on core_initcall(ksysfs_init)
    to have made the kernel_kobj directory "kernel" in which to create "mm".

    Make it postcore_initcall(mm_sysfs_init). We could use core_initcall(),
    and depend upon Makefile link order kernel/ mm/ fs/ ipc/ security/ ...
    as core_initcall(debugfs_init) and core_initcall(securityfs_init) do;
    but better not.

    Signed-off-by: Hugh Dickins
    Acked-by: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

24 Jan, 2014

1 commit

  • The use of __initcall is to be eventually replaced by choosing one from
    the prioritized groupings laid out in init.h header:

    pure_initcall 0
    core_initcall 1
    postcore_initcall 2
    arch_initcall 3
    subsys_initcall 4
    fs_initcall 5
    device_initcall 6
    late_initcall 7

    In the interim, all __initcall are mapped onto device_initcall, which as
    can be seen above, comes quite late in the ordering.

    Currently the mm_kobj is created with __initcall in mm_sysfs_init().
    This means that any other initcalls that want to reference the mm_kobj
    have to be device_initcall (or later), otherwise we will for example,
    trip the BUG_ON(!kobj) in sysfs's internal_create_group(). This
    unfairly restricts those users; for example something that clearly makes
    sense to be an arch_initcall will not be able to choose that.

    However, upon examination, it is only this way for historical reasons
    (i.e. simply not reprioritized yet). We see that sysfs is ready quite
    earlier in init/main.c via:

    vfs_caches_init
    |_ mnt_init
    |_ sysfs_init

    well ahead of the processing of the prioritized calls listed above.

    So we can recategorize mm_sysfs_init to be a pure_initcall, which in
    turn allows any mm_kobj initcall users a wider range (1 --> 7) of
    initcall priorities to choose from.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     

09 Oct, 2013

2 commits

  • Change the per page last fault tracking to use cpu,pid instead of
    nid,pid. This will allow us to try and lookup the alternate task more
    easily. Note that even though it is the cpu that is store in the page
    flags that the mpol_misplaced decision is still based on the node.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Srikar Dronamraju
    Link: http://lkml.kernel.org/r/1381141781-10992-43-git-send-email-mgorman@suse.de
    [ Fixed build failure on 32-bit systems. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Ideally it would be possible to distinguish between NUMA hinting faults that
    are private to a task and those that are shared. If treated identically
    there is a risk that shared pages bounce between nodes depending on
    the order they are referenced by tasks. Ultimately what is desirable is
    that task private pages remain local to the task while shared pages are
    interleaved between sharing tasks running on different nodes to give good
    average performance. This is further complicated by THP as even
    applications that partition their data may not be partitioning on a huge
    page boundary.

    To start with, this patch assumes that multi-threaded or multi-process
    applications partition their data and that in general the private accesses
    are more important for cpu->memory locality in the general case. Also,
    no new infrastructure is required to treat private pages properly but
    interleaving for shared pages requires additional infrastructure.

    To detect private accesses the pid of the last accessing task is required
    but the storage requirements are a high. This patch borrows heavily from
    Ingo Molnar's patch "numa, mm, sched: Implement last-CPU+PID hash tracking"
    to encode some bits from the last accessing task in the page flags as
    well as the node information. Collisions will occur but it is better than
    just depending on the node information. Node information is then used to
    determine if a page needs to migrate. The PID information is used to detect
    private/shared accesses. The preferred NUMA node is selected based on where
    the maximum number of approximately private faults were measured. Shared
    faults are not taken into consideration for a few reasons.

    First, if there are many tasks sharing the page then they'll all move
    towards the same node. The node will be compute overloaded and then
    scheduled away later only to bounce back again. Alternatively the shared
    tasks would just bounce around nodes because the fault information is
    effectively noise. Either way accounting for shared faults the same as
    private faults can result in lower performance overall.

    The second reason is based on a hypothetical workload that has a small
    number of very important, heavily accessed private pages but a large shared
    array. The shared array would dominate the number of faults and be selected
    as a preferred node even though it's the wrong decision.

    The third reason is that multiple threads in a process will race each
    other to fault the shared page making the fault information unreliable.

    Signed-off-by: Mel Gorman
    [ Fix complication error when !NUMA_BALANCING. ]
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Srikar Dronamraju
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1381141781-10992-30-git-send-email-mgorman@suse.de
    Signed-off-by: Ingo Molnar

    Mel Gorman
     

04 Jul, 2013

1 commit

  • Currently the per cpu counter's batch size for memory accounting is
    configured as twice the number of cpus in the system. However, for
    system with very large memory, it is more appropriate to make it
    proportional to the memory size per cpu in the system.

    For example, for a x86_64 system with 64 cpus and 128 GB of memory, the
    batch size is only 2*64 pages (0.5 MB). So any memory accounting
    changes of more than 0.5MB will overflow the per cpu counter into the
    global counter. Instead, for the new scheme, the batch size is
    configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256), which is
    more inline with the memory size.

    I've done a repeated brk test of 800KB (from will-it-scale test suite)
    with 80 concurrent processes on a 4 socket Westmere machine with a total
    of 40 cores. Without the patch, about 80% of cpu is spent on spin-lock
    contention within the vm_committed_as counter. With the patch, there's
    a 73x speedup on the benchmark and the lock contention drops off almost
    entirely.

    [akpm@linux-foundation.org: fix section mismatch]
    Signed-off-by: Tim Chen
    Cc: Tejun Heo
    Cc: Eric Dumazet
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Chen
     

24 Feb, 2013

1 commit

  • Answering the question "how much space remains in the page->flags" is
    time-consuming. mminit_loglevel can help answer the question but it
    does not take last_nid information into account. This patch corrects it
    and while there it corrects the messages related to page flag usage,
    pgshifts and node/zone id. When applied the relevant output looks
    something like this but will depend on the kernel configuration.

    mminit::pageflags_layout_widths Section 0 Node 9 Zone 2 Lastnid 9 Flags 25
    mminit::pageflags_layout_shifts Section 19 Node 9 Zone 2 Lastnid 9
    mminit::pageflags_layout_pgshifts Section 0 Node 55 Zone 53 Lastnid 44
    mminit::pageflags_layout_nodezoneid Node/Zone ID: 64 -> 53
    mminit::pageflags_layout_usage location: 64 -> 44 layout 44 -> 25 unused 25 -> 0 page-flags

    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

31 Oct, 2011

1 commit


21 Aug, 2008

1 commit


06 Aug, 2008

1 commit

  • gcc-3.2:

    mm/mm_init.c:77:1: directives may not be used inside a macro argument
    mm/mm_init.c:76:47: unterminated argument list invoking macro "mminit_dprintk"
    mm/mm_init.c: In function `mminit_verify_pageflags_layout':
    mm/mm_init.c:80: `mminit_dprintk' undeclared (first use in this function)
    mm/mm_init.c:80: (Each undeclared identifier is reported only once
    mm/mm_init.c:80: for each function it appears in.)
    mm/mm_init.c:80: syntax error before numeric constant

    Also fix a typo in a comment.

    Reported-by: Adrian Bunk
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

25 Jul, 2008

5 commits

  • Add a kobject to create /sys/kernel/mm when sysfs is mounted. The kobject
    will exist regardless. This will allow for the hugepage related sysfs
    directories to exist under the mm "subsystem" directory. Add an ABI file
    appropriately.

    [kosaki.motohiro@jp.fujitsu.com: fix build]
    Signed-off-by: Nishanth Aravamudan
    Cc: Nick Piggin
    Cc: Mel Gorman
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     
  • Towards the end of putting all core mm initialization in mm_init.c, I
    plan on putting the creation of a mm kobject in a function in that file.
    However, the file is currently only compiled if CONFIG_DEBUG_MEMORY_INIT
    is set. Remove this dependency, but put the code under an #ifdef on the
    same config option. This should result in no functional changes.

    Signed-off-by: Nishanth Aravamudan
    Cc: Nick Piggin
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     
  • This patch prints out the zonelists during boot for manual verification by the
    user if the mminit_loglevel is MMINIT_VERIFY or higher.

    Signed-off-by: Mel Gorman
    Cc: Christoph Lameter
    Cc: Andy Whitcroft
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Print out information on how the page flags are being used if mminit_loglevel
    is MMINIT_VERIFY or higher and unconditionally performs sanity checks on the
    flags regardless of loglevel.

    When the page flags are updated with section, node and zone information, a
    check are made to ensure the values can be retrieved correctly. Finally we
    confirm that pfn_to_page and page_to_pfn are the correct inverse functions.

    [akpm@linux-foundation.org: fix printk warnings]
    Signed-off-by: Mel Gorman
    Cc: Christoph Lameter
    Cc: Andy Whitcroft
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Boot initialisation is very complex, with significant numbers of
    architecture-specific routines, hooks and code ordering. While significant
    amounts of the initialisation is architecture-independent, it trusts the data
    received from the architecture layer. This is a mistake, and has resulted in
    a number of difficult-to-diagnose bugs.

    This patchset adds some validation and tracing to memory initialisation. It
    also introduces a few basic defensive measures. The validation code can be
    explicitly disabled for embedded systems.

    This patch:

    Add additional debugging and verification code for memory initialisation.

    Once enabled, the verification checks are always run and when required
    additional debugging information may be outputted via a mminit_loglevel=
    command-line parameter.

    The verification code is placed in a new file mm/mm_init.c. Ideally other mm
    initialisation code will be moved here over time.

    Signed-off-by: Mel Gorman
    Cc: Christoph Lameter
    Cc: Andy Whitcroft
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman