08 Oct, 2016

2 commits

  • There is a memory waste problem if we define field on struct page_ext by
    hard-coding. Entry size of struct page_ext includes the size of those
    fields even if it is disabled at runtime. Now, extra memory request at
    runtime is possible so page_owner don't need to define it's own fields
    by hard-coding.

    This patch removes hard-coded define and uses extra memory for storing
    page_owner information in page_owner. Most of code are just mechanical
    changes.

    Link: http://lkml.kernel.org/r/1471315879-32294-7-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Until now, if some page_ext users want to use it's own field on
    page_ext, it should be defined in struct page_ext by hard-coding. It
    has a problem that wastes memory in following situation.

    struct page_ext {
    #ifdef CONFIG_A
    int a;
    #endif
    #ifdef CONFIG_B
    int b;
    #endif
    };

    Assume that kernel is built with both CONFIG_A and CONFIG_B. Even if we
    enable feature A and doesn't enable feature B at runtime, each entry of
    struct page_ext takes two int rather than one int. It's undesirable
    result so this patch tries to fix it.

    To solve above problem, this patch implements to support extra space
    allocation at runtime. When need() callback returns true, it's extra
    memory requirement is summed to entry size of page_ext. Also, offset
    for each user's extra memory space is returned. With this offset, user
    can use this extra space and there is no need to define needed field on
    page_ext by hard-coding.

    This patch only implements an infrastructure. Following patch will use
    it for page_owner which is only user having it's own fields on page_ext.

    Link: http://lkml.kernel.org/r/1471315879-32294-6-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

27 Jul, 2016

1 commit

  • Currently, we store each page's allocation stacktrace on corresponding
    page_ext structure and it requires a lot of memory. This causes the
    problem that memory tight system doesn't work well if page_owner is
    enabled. Moreover, even with this large memory consumption, we cannot
    get full stacktrace because we allocate memory at boot time and just
    maintain 8 stacktrace slots to balance memory consumption. We could
    increase it to more but it would make system unusable or change system
    behaviour.

    To solve the problem, this patch uses stackdepot to store stacktrace.
    It obviously provides memory saving but there is a drawback that
    stackdepot could fail.

    stackdepot allocates memory at runtime so it could fail if system has
    not enough memory. But, most of allocation stack are generated at very
    early time and there are much memory at this time. So, failure would
    not happen easily. And, one failure means that we miss just one page's
    allocation stacktrace so it would not be a big problem. In this patch,
    when memory allocation failure happens, we store special stracktrace
    handle to the page that is failed to save stacktrace. With it, user can
    guess memory usage properly even if failure happens.

    Memory saving looks as following. (4GB memory system with page_owner)
    (before the patch -> after the patch)

    static allocation:
    92274688 bytes -> 25165824 bytes

    dynamic allocation after boot + kernel build:
    0 bytes -> 327680 bytes

    total:
    92274688 bytes -> 25493504 bytes

    72% reduction in total.

    Note that implementation looks complex than someone would imagine
    because there is recursion issue. stackdepot uses page allocator and
    page_owner is called at page allocation. Using stackdepot in page_owner
    could re-call page allcator and then page_owner. That is a recursion.
    To detect and avoid it, whenever we obtain stacktrace, recursion is
    checked and page_owner is set to dummy information if found. Dummy
    information means that this page is allocated for page_owner feature
    itself (such as stackdepot) and it's understandable behavior for user.

    [iamjoonsoo.kim@lge.com: mm-page_owner-use-stackdepot-to-store-stacktrace-v3]
    Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1466150259-27727-7-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Alexander Potapenko
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

16 Mar, 2016

1 commit

  • During migration, page_owner info is now copied with the rest of the
    page, so the stacktrace leading to free page allocation during migration
    is overwritten. For debugging purposes, it might be however useful to
    know that the page has been migrated since its initial allocation. This
    might happen many times during the lifetime for different reasons and
    fully tracking this, especially with stacktraces would incur extra
    memory costs. As a compromise, store and print the migrate_reason of
    the last migration that occurred to the page. This is enough to
    distinguish compaction, numa balancing etc.

    Example page_owner entry after the patch:

    Page allocated via order 0, mask 0x24200ca(GFP_HIGHUSER_MOVABLE)
    PFN 628753 type Movable Block 1228 type Movable Flags 0x1fffff80040030(dirty|lru|swapbacked)
    [] __alloc_pages_nodemask+0x134/0x230
    [] alloc_pages_vma+0xb5/0x250
    [] shmem_alloc_page+0x61/0x90
    [] shmem_getpage_gfp+0x678/0x960
    [] shmem_fallocate+0x329/0x440
    [] vfs_fallocate+0x140/0x230
    [] SyS_fallocate+0x44/0x70
    [] entry_SYSCALL_64_fastpath+0x12/0x71
    Page has been migrated, last migrate reason: compaction

    Signed-off-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Cc: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

11 Sep, 2015

1 commit

  • Knowing the portion of memory that is not used by a certain application or
    memory cgroup (idle memory) can be useful for partitioning the system
    efficiently, e.g. by setting memory cgroup limits appropriately.
    Currently, the only means to estimate the amount of idle memory provided
    by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the
    access bit for all pages mapped to a particular process by writing 1 to
    clear_refs, wait for some time, and then count smaps:Referenced. However,
    this method has two serious shortcomings:

    - it does not count unmapped file pages
    - it affects the reclaimer logic

    To overcome these drawbacks, this patch introduces two new page flags,
    Idle and Young, and a new sysfs file, /sys/kernel/mm/page_idle/bitmap.
    A page's Idle flag can only be set from userspace by setting bit in
    /sys/kernel/mm/page_idle/bitmap at the offset corresponding to the page,
    and it is cleared whenever the page is accessed either through page tables
    (it is cleared in page_referenced() in this case) or using the read(2)
    system call (mark_page_accessed()). Thus by setting the Idle flag for
    pages of a particular workload, which can be found e.g. by reading
    /proc/PID/pagemap, waiting for some time to let the workload access its
    working set, and then reading the bitmap file, one can estimate the amount
    of pages that are not used by the workload.

    The Young page flag is used to avoid interference with the memory
    reclaimer. A page's Young flag is set whenever the Access bit of a page
    table entry pointing to the page is cleared by writing to the bitmap file.
    If page_referenced() is called on a Young page, it will add 1 to its
    return value, therefore concealing the fact that the Access bit was
    cleared.

    Note, since there is no room for extra page flags on 32 bit, this feature
    uses extended page flags when compiled on 32 bit.

    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: kpageidle requires an MMU]
    [akpm@linux-foundation.org: decouple from page-flags rework]
    Signed-off-by: Vladimir Davydov
    Reviewed-by: Andres Lagar-Cavilla
    Cc: Minchan Kim
    Cc: Raghavendra K T
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Michel Lespinasse
    Cc: David Rientjes
    Cc: Pavel Emelyanov
    Cc: Cyrill Gorcunov
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

12 Feb, 2015

1 commit

  • Page owner uses the page_ext structure to keep meta-information for every
    page in the system. The structure also contains a field of type 'struct
    stack_trace', page owner uses this field during invocation of the function
    save_stack_trace. It is easy to notice that keeping a copy of this
    structure for every page in the system is very inefficiently in terms of
    memory.

    The patch removes this unnecessary field of page_ext and forces page owner
    to use a stack_trace structure allocated on the stack.

    [akpm@linux-foundation.org: use struct initializers]
    Signed-off-by: Sergei Rogachev
    Acked-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergei Rogachev
     

14 Dec, 2014

3 commits

  • This is the page owner tracking code which is introduced so far ago. It
    is resident on Andrew's tree, though, nobody tried to upstream so it
    remain as is. Our company uses this feature actively to debug memory leak
    or to find a memory hogger so I decide to upstream this feature.

    This functionality help us to know who allocates the page. When
    allocating a page, we store some information about allocation in extra
    memory. Later, if we need to know status of all pages, we can get and
    analyze it from this stored information.

    In previous version of this feature, extra memory is statically defined in
    struct page, but, in this version, extra memory is allocated outside of
    struct page. It enables us to turn on/off this feature at boottime
    without considerable memory waste.

    Although we already have tracepoint for tracing page allocation/free,
    using it to analyze page owner is rather complex. We need to enlarge the
    trace buffer for preventing overlapping until userspace program launched.
    And, launched program continually dump out the trace buffer for later
    analysis and it would change system behaviour with more possibility rather
    than just keeping it in memory, so bad for debug.

    Moreover, we can use page_owner feature further for various purposes. For
    example, we can use it for fragmentation statistics implemented in this
    patch. And, I also plan to implement some CMA failure debugging feature
    using this interface.

    I'd like to give the credit for all developers contributed this feature,
    but, it's not easy because I don't know exact history. Sorry about that.
    Below is people who has "Signed-off-by" in the patches in Andrew's tree.

    Contributor:
    Alexander Nyberg
    Mel Gorman
    Dave Hansen
    Minchan Kim
    Michal Nazarewicz
    Andrew Morton
    Jungsoo Son

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Until now, debug-pagealloc needs extra flags in struct page, so we need to
    recompile whole source code when we decide to use it. This is really
    painful, because it takes some time to recompile and sometimes rebuild is
    not possible due to third party module depending on struct page. So, we
    can't use this good feature in many cases.

    Now, we have the page extension feature that allows us to insert extra
    flags to outside of struct page. This gets rid of third party module
    issue mentioned above. And, this allows us to determine if we need extra
    memory for this page extension in boottime. With these property, we can
    avoid using debug-pagealloc in boottime with low computational overhead in
    the kernel built with CONFIG_DEBUG_PAGEALLOC. This will help our
    development process greatly.

    This patch is the preparation step to achive above goal. debug-pagealloc
    originally uses extra field of struct page, but, after this patch, it will
    use field of struct page_ext. Because memory for page_ext is allocated
    later than initialization of page allocator in CONFIG_SPARSEMEM, we should
    disable debug-pagealloc feature temporarily until initialization of
    page_ext. This patch implements this.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • When we debug something, we'd like to insert some information to every
    page. For this purpose, we sometimes modify struct page itself. But,
    this has drawbacks. First, it requires re-compile. This makes us
    hesitate to use the powerful debug feature so development process is
    slowed down. And, second, sometimes it is impossible to rebuild the
    kernel due to third party module dependency. At third, system behaviour
    would be largely different after re-compile, because it changes size of
    struct page greatly and this structure is accessed by every part of
    kernel. Keeping this as it is would be better to reproduce errornous
    situation.

    This feature is intended to overcome above mentioned problems. This
    feature allocates memory for extended data per page in certain place
    rather than the struct page itself. This memory can be accessed by the
    accessor functions provided by this code. During the boot process, it
    checks whether allocation of huge chunk of memory is needed or not. If
    not, it avoids allocating memory at all. With this advantage, we can
    include this feature into the kernel in default and can avoid rebuild and
    solve related problems.

    Until now, memcg uses this technique. But, now, memcg decides to embed
    their variable to struct page itself and it's code to extend struct page
    has been removed. I'd like to use this code to develop debug feature, so
    this patch resurrect it.

    To help these things to work well, this patch introduces two callbacks for
    clients. One is the need callback which is mandatory if user wants to
    avoid useless memory allocation at boot-time. The other is optional, init
    callback, which is used to do proper initialization after memory is
    allocated. Detailed explanation about purpose of these functions is in
    code comment. Please refer it.

    Others are completely same with previous extension code in memcg.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim