23 Feb, 2017

3 commits

  • mm_vmscan_lru_isolate currently prints only whether the LRU we isolate
    from is file or anonymous but we do not know which LRU this is.

    It is useful to know whether the list is active or inactive, since we
    are using the same function to isolate pages from both of them and it's
    hard to distinguish otherwise.

    Link: http://lkml.kernel.org/r/20170104101942.4860-5-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Hillf Danton
    Acked-by: Mel Gorman
    Acked-by: Minchan Kim
    Acked-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Higher order requests oom debugging is currently quite hard. We do have
    some compaction points which can tell us how the compaction is operating
    but there is no trace point to tell us about compaction retry logic.
    This patch adds a one which will have the following format

    bash-3126 [001] .... 1498.220001: compact_retry: order=9 priority=COMPACT_PRIO_SYNC_LIGHT compaction_result=withdrawn retries=0 max_retries=16 should_retry=0

    we can see that the order 9 request is not retried even though we are in
    the highest compaction priority mode becase the last compaction attempt
    was withdrawn. This means that compaction_zonelist_suitable must have
    returned false and there is no suitable zone to compact for this request
    and so no need to retry further.

    another example would be
    -3137 [001] .... 81.501689: compact_retry: order=9 priority=COMPACT_PRIO_SYNC_LIGHT compaction_result=failed retries=0 max_retries=16 should_retry=0

    in this case the order-9 compaction failed to find any suitable block.
    We do not retry anymore because this is a costly request and those do
    not go below COMPACT_PRIO_SYNC_LIGHT priority.

    Link: http://lkml.kernel.org/r/20161220130135.15719-4-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: David Rientjes
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • COMPACTION_STATUS resp. ZONE_TYPE are currently used to translate enum
    compact_result resp. struct zone index into their symbolic names for an
    easier post processing. The follow up patch would like to reuse this as
    well. The code involves some preprocessor black magic which is better not
    duplicated elsewhere so move it to a common mm tracing relate header.

    Link: http://lkml.kernel.org/r/20161220130135.15719-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: David Rientjes
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

11 Jan, 2017

1 commit

  • The flag was introduced by commit 78afd5612deb ("mm: add
    __GFP_OTHER_NODE flag") to allow proper accounting of remote node
    allocations done by kernel daemons on behalf of a process - e.g.
    khugepaged.

    After "mm: fix remote numa hits statistics" we do not need and actually
    use the flag so we can safely remove it because all allocations which
    are satisfied from their "home" node are accounted properly.

    [mhocko@suse.com: fix build]
    Link: http://lkml.kernel.org/r/20170106122225.GK5556@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/20170102153057.9451-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Taku Izumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

26 Dec, 2016

2 commits

  • Add a new page flag, PageWaiters, to indicate the page waitqueue has
    tasks waiting. This can be tested rather than testing waitqueue_active
    which requires another cacheline load.

    This bit is always set when the page has tasks on page_waitqueue(page),
    and is set and cleared under the waitqueue lock. It may be set when
    there are no tasks on the waitqueue, which will cause a harmless extra
    wakeup check that will clears the bit.

    The generic bit-waitqueue infrastructure is no longer used for pages.
    Instead, waitqueues are used directly with a custom key type. The
    generic code was not flexible enough to have PageWaiters manipulation
    under the waitqueue lock (which simplifies concurrency).

    This improves the performance of page lock intensive microbenchmarks by
    2-3%.

    Putting two bits in the same word opens the opportunity to remove the
    memory barrier between clearing the lock bit and testing the waiters
    bit, after some work on the arch primitives (e.g., ensuring memory
    operand widths match and cover both bits).

    Signed-off-by: Nicholas Piggin
    Cc: Dave Hansen
    Cc: Bob Peterson
    Cc: Steven Whitehouse
    Cc: Andrew Lutomirski
    Cc: Andreas Gruenbacher
    Cc: Peter Zijlstra
    Cc: Mel Gorman
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     
  • A page is not added to the swap cache without being swap backed,
    so PageSwapBacked mappings can use PG_owner_priv_1 for PageSwapCache.

    Signed-off-by: Nicholas Piggin
    Acked-by: Hugh Dickins
    Cc: Dave Hansen
    Cc: Bob Peterson
    Cc: Steven Whitehouse
    Cc: Andrew Lutomirski
    Cc: Andreas Gruenbacher
    Cc: Peter Zijlstra
    Cc: Mel Gorman
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     

29 Jul, 2016

1 commit

  • After the previous patch, we can distinguish costly allocations that
    should be really lightweight, such as THP page faults, with
    __GFP_NORETRY. This means we don't need to recognize khugepaged
    allocations via PF_KTHREAD anymore. We can also change THP page faults
    in areas where madvise(MADV_HUGEPAGE) was used to try as hard as
    khugepaged, as the process has indicated that it benefits from THP's and
    is willing to pay some initial latency costs.

    We can also make the flags handling less cryptic by distinguishing
    GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
    GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding
    __GFP_NORETRY or __GFP_KSWAPD_RECLAIM is done where needed.

    The patch effectively changes the current GFP_TRANSHUGE users as
    follows:

    * get_huge_zero_page() - the zero page lifetime should be relatively
    long and it's shared by multiple users, so it's worth spending some
    effort on it. We use GFP_TRANSHUGE, and __GFP_NORETRY is not added.
    This also restores direct reclaim to this allocation, which was
    unintentionally removed by commit e4a49efe4e7e ("mm: thp: set THP defrag
    by default to madvise and add a stall-free defrag option")

    * alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency
    is not an issue. So if khugepaged "defrag" is enabled (the default), do
    reclaim via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the
    PF_KTHREAD check from page alloc.

    As a side-effect, khugepaged will now no longer check if the initial
    compaction was deferred or contended. This is OK, as khugepaged sleep
    times between collapsion attempts are long enough to prevent noticeable
    disruption, so we should allow it to spend some effort.

    * migrate_misplaced_transhuge_page() - already was masking out
    __GFP_RECLAIM, so just convert to GFP_TRANSHUGE_LIGHT which is
    equivalent.

    * alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise)
    are now allocating without __GFP_NORETRY. Other vma's keep using
    __GFP_NORETRY if direct reclaim/compaction is at all allowed (by default
    it's allowed only for madvised vma's). The rest is conversion to
    GFP_TRANSHUGE(_LIGHT).

    [mhocko@suse.com: suggested GFP_TRANSHUGE_LIGHT]
    Link: http://lkml.kernel.org/r/20160721073614.24395-7-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

18 Mar, 2016

1 commit

  • Get list of VMA flags up-to-date and sort it to match VM_* definition
    order.

    [vbabka@suse.cz: add a note above vmaflag definitions to update the names when changing]
    Signed-off-by: Kirill A. Shutemov
    Acked-by: Vlastimil Babka
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

16 Mar, 2016

1 commit

  • In tracepoints, it's possible to print gfp flags in a human-friendly
    format through a macro show_gfp_flags(), which defines a translation
    array and passes is to __print_flags(). Since the following patch will
    introduce support for gfp flags printing in printk(), it would be nice
    to reuse the array. This is not straightforward, since __print_flags()
    can't simply reference an array defined in a .c file such as mm/debug.c
    - it has to be a macro to allow the macro magic to communicate the
    format to userspace tools such as trace-cmd.

    The solution is to create a macro __def_gfpflag_names which is used both
    in show_gfp_flags(), and to define the gfpflag_names[] array in
    mm/debug.c.

    On the other hand, mm/debug.c also defines translation tables for page
    flags and vma flags, and desire was expressed (but not implemented in
    this series) to use these also from tracepoints. Thus, this patch also
    renames the events/gfpflags.h file to events/mmflags.h and moves the
    table definitions there, using the same macro approach as for gfpflags.
    This allows translating all three kinds of mm-specific flags both in
    tracepoints and printk.

    Signed-off-by: Vlastimil Babka
    Reviewed-by: Michal Hocko
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Rasmus Villemoes
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Cc: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka