Eric Lee / smarc-fsl-linux-kernel

26 Oct, 2020

1 commit

05d2a661f Merge 54a4c789ca80 ("Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/li… ... Browse Code »

…nux/kernel/git/mchehab/linux-media") into android-mainline

Steps on the way to 5.10-rc1

Resolves conflicts in:
fs/userfaultfd.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af

Greg Kroah-Hartman
2020-10-26 16:23:33 +0800

17 Oct, 2020

1 commit

406100762 mm/vmstat.c: use helper macro abs() ... Browse Code »

Use helper macro abs() to simplify the "x > t || x < -t" cmp.

Signed-off-by: Miaohe Lin
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200905084008.15748-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-10-17 02:11:17 +0800

07 Sep, 2020

1 commit

3d3ef2a05 Merge 5.9-rc4 into android-mainline ... Browse Code »

Linux 5.9-rc4

Signed-off-by: Greg Kroah-Hartman
Change-Id: I3d041935cae5e8f3421edcdee4892f17e2c776ad

Greg Kroah-Hartman
2020-09-07 15:24:58 +0800

05 Sep, 2020

2 commits

b25d1dc94 Merge branch 'simplify-do_wp_page' ... Browse Code »

Merge emailed patches from Peter Xu:
"This is a small series that I picked up from Linus's suggestion to
simplify cow handling (and also make it more strict) by checking
against page refcounts rather than mapcounts.

This makes uffd-wp work again (verified by running upmapsort)"

Note: this is horrendously bad timing, and making this kind of
fundamental vm change after -rc3 is not at all how things should work.
The saving grace is that it really is a a nice simplification:

8 files changed, 29 insertions(+), 120 deletions(-)

The reason for the bad timing is that it turns out that commit
17839856fd58 ("gup: document and work around 'COW can break either way'
issue" broke not just UFFD functionality (as Peter noticed), but Mikulas
Patocka also reports that it caused issues for strace when running in a
DAX environment with ext4 on a persistent memory setup.

And we can't just revert that commit without re-introducing the original
issue that is a potential security hole, so making COW stricter (and in
the process much simpler) is a step to then undoing the forced COW that
broke other uses.

Link: https://lore.kernel.org/lkml/alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com/

* emailed patches from Peter Xu :
mm: Add PGREUSE counter
mm/gup: Remove enfornced COW mechanism
mm/ksm: Remove reuse_ksm_page()
mm: do_wp_page() simplification

Linus Torvalds
2020-09-05 00:31:54 +0800
798a6b87e mm: Add PGREUSE counter ... Browse Code »

This accounts for wp_page_reuse() case, where we reused a page for COW.

Signed-off-by: Peter Xu
Signed-off-by: Linus Torvalds

Peter Xu
2020-09-05 00:25:20 +0800

17 Aug, 2020

1 commit

df7e491a3 Merge 4b6c093e21d3 ("Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/lin… ... Browse Code »

…ux-block") into android-mainline

Steps on the way to 5.9-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I904678b5c31139b25fb49ee67cbacb4721a3c7bc

Greg Kroah-Hartman
2020-08-17 15:19:35 +0800

15 Aug, 2020

1 commit

a8a4b7aea Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone" ... Browse Code »

This reverts commit 26e7deadaae175.

Sonny reported that one of their tests started failing on the latest
kernel on their Chrome OS platform. The root cause is that the above
commit removed the protection line of empty zone, while the parser used in
the test relies on the protection line to mark the end of each zone.

Let's revert it to avoid breaking userspace testing or applications.

Fixes: 26e7deadaae175 ("mm/vmstat.c: do not show lowmem reserve protection information of empty zone)"
Reported-by: Sonny Rao
Signed-off-by: Baoquan He
Signed-off-by: Andrew Morton
Reviewed-by: David Hildenbrand
Acked-by: David Rientjes
Cc: [5.8.x]
Link: http://lkml.kernel.org/r/20200811075412.12872-1-bhe@redhat.com
Signed-off-by: Linus Torvalds

Baoquan He
2020-08-15 10:56:56 +0800

13 Aug, 2020

5 commits

418b4bd4a Merge dc06fe51d26e ("Merge tag 'rtc-5.9' of git://git.kernel.org/pub/scm/linux/k… ... Browse Code »

…ernel/git/abelloni/linux") into android-mainline

Steps on the way to 5.9-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iceded779988ff472863b7e1c54e22a9fa6383a30

Greg Kroah-Hartman
2020-08-13 15:09:55 +0800
1a5bae25e mm/vmstat: add events for THP migration without split ... Browse Code »

Add following new vmstat events which will help in validating THP
migration without split. Statistics reported through these new VM events
will help in performance debugging.

1. THP_MIGRATION_SUCCESS
2. THP_MIGRATION_FAILURE
3. THP_MIGRATION_SPLIT

In addition, these new events also update normal page migration statistics
appropriately via PGMIGRATE_SUCCESS and PGMIGRATE_FAILURE. While here,
this updates current trace event 'mm_migrate_pages' to accommodate now
available THP statistics.

[akpm@linux-foundation.org: s/hpage_nr_pages/thp_nr_pages/]
[ziy@nvidia.com: v2]
Link: http://lkml.kernel.org/r/C5E3C65C-8253-4638-9D3C-71A61858BB8B@nvidia.com
[anshuman.khandual@arm.com: s/thp_nr_pages/hpage_nr_pages/]
Link: http://lkml.kernel.org/r/1594287583-16568-1-git-send-email-anshuman.khandual@arm.com

Signed-off-by: Anshuman Khandual
Signed-off-by: Zi Yan
Signed-off-by: Andrew Morton
Reviewed-by: Daniel Jordan
Cc: Hugh Dickins
Cc: Matthew Wilcox
Cc: Zi Yan
Cc: John Hubbard
Cc: Naoya Horiguchi
Link: http://lkml.kernel.org/r/1594080415-27924-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

Anshuman Khandual
2020-08-13 01:57:57 +0800
d34c0a759 mm: use unsigned types for fragmentation score ... Browse Code »

Proactive compaction uses per-node/zone "fragmentation score" which is
always in range [0, 100], so use unsigned type of these scores as well as
for related constants.

Signed-off-by: Nitin Gupta
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Cc: Luis Chamberlain
Cc: Kees Cook
Cc: Iurii Zaikin
Cc: Vlastimil Babka
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200618010319.13159-1-nigupta@nvidia.com
Signed-off-by: Linus Torvalds

Nitin Gupta
2020-08-13 01:57:56 +0800
facdaa917 mm: proactive compaction ... Browse Code »

For some applications, we need to allocate almost all memory as hugepages.
However, on a running system, higher-order allocations can fail if the
memory is fragmented. Linux kernel currently does on-demand compaction as
we request more hugepages, but this style of compaction incurs very high
latency. Experiments with one-time full memory compaction (followed by
hugepage allocations) show that kernel is able to restore a highly
fragmented memory state to a fairly compacted memory state within 98% of free memory could be allocated as hugepages)

- With 5.6.0-rc3 + this patch, with proactiveness=20

sysctl -w vm.compaction_proactiveness=20

percentile latency
–––––––––– –––––––
5 2
10 2
25 3
30 3
40 3
50 4
60 4
75 4
80 4
90 5
95 429

Total 2M hugepages allocated = 384105 (750G worth of hugepages out of 762G
total free => 98% of free memory could be allocated as hugepages)

2. JAVA heap allocation

In this test, we first fragment memory using the same method as for (1).

Then, we start a Java process with a heap size set to 700G and request the
heap to be allocated with THP hugepages. We also set THP to madvise to
allow hugepage backing of this heap.

/usr/bin/time
java -Xms700G -Xmx700G -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch

The above command allocates 700G of Java heap using hugepages.

- With vanilla 5.6.0-rc3

17.39user 1666.48system 27:37.89elapsed

- With 5.6.0-rc3 + this patch, with proactiveness=20

8.35user 194.58system 3:19.62elapsed

Elapsed time remains around 3:15, as proactiveness is further increased.

Note that proactive compaction happens throughout the runtime of these
workloads. The situation of one-time compaction, sufficient to supply
hugepages for following allocation stream, can probably happen for more
extreme proactiveness values, like 80 or 90.

In the above Java workload, proactiveness is set to 20. The test starts
with a node's score of 80 or higher, depending on the delay between the
fragmentation step and starting the benchmark, which gives more-or-less
time for the initial round of compaction. As t he benchmark consumes
hugepages, node's score quickly rises above the high threshold (90) and
proactive compaction starts again, which brings down the score to the low
threshold level (80). Repeat.

bpftrace also confirms proactive compaction running 20+ times during the
runtime of this Java benchmark. kcompactd threads consume 100% of one of
the CPUs while it tries to bring a node's score within thresholds.

Backoff behavior
================

Above workloads produce a memory state which is easy to compact. However,
if memory is filled with unmovable pages, proactive compaction should
essentially back off. To test this aspect:

- Created a kernel driver that allocates almost all memory as hugepages
followed by freeing first 3/4 of each hugepage.
- Set proactiveness=40
- Note that proactive_compact_node() is deferred maximum number of times
with HPAGE_FRAG_CHECK_INTERVAL_MSEC of wait between each check
(=> ~30 seconds between retries).

[1] https://patchwork.kernel.org/patch/11098289/
[2] https://lore.kernel.org/linux-mm/20161230131412.GI13301@dhcp22.suse.cz/
[3] https://lwn.net/Articles/817905/

Signed-off-by: Nitin Gupta
Signed-off-by: Andrew Morton
Tested-by: Oleksandr Natalenko
Reviewed-by: Vlastimil Babka
Reviewed-by: Khalid Aziz
Reviewed-by: Oleksandr Natalenko
Cc: Vlastimil Babka
Cc: Khalid Aziz
Cc: Michal Hocko
Cc: Mel Gorman
Cc: Matthew Wilcox
Cc: Mike Kravetz
Cc: Joonsoo Kim
Cc: David Rientjes
Cc: Nitin Gupta
Cc: Oleksandr Natalenko
Link: http://lkml.kernel.org/r/20200616204527.19185-1-nigupta@nvidia.com
Signed-off-by: Linus Torvalds

Nitin Gupta
2020-08-13 01:57:56 +0800
170b04b7a mm/workingset: prepare the workingset detection infrastructure for anon LRU ... Browse Code »

To prepare the workingset detection for anon LRU, this patch splits
workingset event counters for refault, activate and restore into anon and
file variants, as well as the refaults counter in struct lruvec.

Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Acked-by: Johannes Weiner
Acked-by: Vlastimil Babka
Cc: Hugh Dickins
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Minchan Kim
Link: http://lkml.kernel.org/r/1595490560-15117-4-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds

Joonsoo Kim
2020-08-13 01:57:55 +0800

08 Aug, 2020

3 commits

a17a563d1 Merge 449dc8c97089 ("Merge tag 'for-v5.9' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/sre/linux-power-supply") into android-mainline

Merges along the way to 5.9-rc1

resolves conflicts in:
Documentation/ABI/testing/sysfs-class-power
drivers/power/supply/power_supply_sysfs.c
fs/crypto/inline_crypt.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia087834f54fb4e5269d68c3c404747ceed240701

Greg Kroah-Hartman
2020-08-08 19:07:20 +0800
991e76738 mm: memcontrol: account kernel stack per node ... Browse Code »

Currently the kernel stack is being accounted per-zone. There is no need
to do that. In addition due to being per-zone, memcg has to keep a
separate MEMCG_KERNEL_STACK_KB. Make the stat per-node and deprecate
MEMCG_KERNEL_STACK_KB as memcg_stat_item is an extension of
node_stat_item. In addition localize the kernel stack stats updates to
account_kernel_stack().

Signed-off-by: Shakeel Butt
Signed-off-by: Andrew Morton
Reviewed-by: Roman Gushchin
Cc: Johannes Weiner
Cc: Michal Hocko
Link: http://lkml.kernel.org/r/20200630161539.1759185-1-shakeelb@google.com
Signed-off-by: Linus Torvalds

Shakeel Butt
2020-08-08 02:33:25 +0800
ea426c2a7 mm: memcg: prepare for byte-sized vmstat items ... Browse Code »

To implement per-object slab memory accounting, we need to convert slab
vmstat counters to bytes. Actually, out of 4 levels of counters: global,
per-node, per-memcg and per-lruvec only two last levels will require
byte-sized counters. It's because global and per-node counters will be
counting the number of slab pages, and per-memcg and per-lruvec will be
counting the amount of memory taken by charged slab objects.

Converting all vmstat counters to bytes or even all slab counters to bytes
would introduce an additional overhead. So instead let's store global and
per-node counters in pages, and memcg and lruvec counters in bytes.

To make the API clean all access helpers (both on the read and write
sides) are dealing with bytes.

To avoid back-and-forth conversions a new flavor of read-side helpers is
introduced, which always returns values in pages: node_page_state_pages()
and global_node_page_state_pages().

Actually new helpers are just reading raw values. Old helpers are simple
wrappers, which will complain on an attempt to read byte value, because at
the moment no one actually needs bytes.

Thanks to Johannes Weiner for the idea of having the byte-sized API on top
of the page-sized internal storage.

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Acked-by: Johannes Weiner
Cc: Christoph Lameter
Cc: Michal Hocko
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20200623174037.3951353-3-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-08 02:33:24 +0800

24 Jul, 2020

1 commit

5a364f0a5 ANDROID: GKI: Don't compact data structures when CONFIG_ZSMALLOC=n ... Browse Code »

In order to keep abi compatibility for vendor zsmalloc module, ensure
data structures are not compacted even if zsmalloc is disabled.

Bug: 150917000
Change-Id: I8d5d85e50fa34204a48db67f2ae69eed591e8345
Signed-off-by: Kalesh Singh
(cherry picked from commit caed3ce0f7ff8e031e5c9600a16c8fcc04c40ae7)

Kalesh Singh
2020-07-24 22:02:34 +0800

05 Jun, 2020

1 commit

01a995600 mm/vmstat.c: convert to use DEFINE_SEQ_ATTRIBUTE macro ... Browse Code »

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Signed-off-by: Kefeng Wang
Signed-off-by: Andrew Morton
Cc: Anil S Keshavamurthy
Cc: "David S. Miller"
Cc: Greg KH
Cc: Ingo Molnar
Cc: Masami Hiramatsu
Cc: Al Viro
Link: http://lkml.kernel.org/r/20200509064031.181091-3-wangkefeng.wang@huawei.com
Signed-off-by: Linus Torvalds

Kefeng Wang
2020-06-05 10:06:26 +0800

04 Jun, 2020

4 commits

ee01c4d72 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:
"More mm/ work, plenty more to come

Subsystems affected by this patch series: slub, memcg, gup, kasan,
pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
thp, mmap, kconfig"

* akpm: (131 commits)
arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
riscv: support DEBUG_WX
mm: add DEBUG_WX support
drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
powerpc/mm: drop platform defined pmd_mknotpresent()
mm: thp: don't need to drain lru cache when splitting and mlocking THP
hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
sparc32: register memory occupied by kernel as memblock.memory
include/linux/memblock.h: fix minor typo and unclear comment
mm, mempolicy: fix up gup usage in lookup_node
tools/vm/page_owner_sort.c: filter out unneeded line
mm: swap: memcg: fix memcg stats for huge pages
mm: swap: fix vmstats for huge pages
mm: vmscan: limit the range of LRU type balancing
mm: vmscan: reclaim writepage is IO cost
mm: vmscan: determine anon/file pressure balance at the reclaim root
mm: balance LRU lists based on relative thrashing
mm: only count actual rotations as LRU reclaim cost
...

Linus Torvalds
2020-06-04 11:24:15 +0800
497a6c1b0 mm: keep separate anon and file statistics on page reclaim activity ... Browse Code »

Having statistics on pages scanned and pages reclaimed for both anon and
file pages makes it easier to evaluate changes to LRU balancing.

While at it, clean up the stat-keeping mess for isolation, putback,
reclaim stats etc. a bit: first the physical LRU operation (isolation and
putback), followed by vmstats, reclaim_stats, and then vm events.

Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Cc: Joonsoo Kim
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Rik van Riel
Link: http://lkml.kernel.org/r/20200520232525.798933-3-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds

Johannes Weiner
2020-06-04 11:09:48 +0800
26e7deada mm/vmstat.c: do not show lowmem reserve protection information of empty zone ... Browse Code »

Because the lowmem reserve protection of a zone can't tell anything if the
zone is empty, except of adding one more line in /proc/zoneinfo.

Let's remove it from that zone's showing.

Signed-off-by: Baoquan He
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200402140113.3696-4-bhe@redhat.com
Signed-off-by: Linus Torvalds

Baoquan He
2020-06-04 11:09:44 +0800
cb8e59cc8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.

2) Add GSO partial support to igc, from Sasha Neftin.

3) Several cleanups and improvements to r8169 from Heiner Kallweit.

4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.

5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.

6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

8) Add sriov and vf support to hinic, from Luo bin.

9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.

10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.

12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.

13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.

14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.

15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.

16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.

18) Several RISCV bpf jit optimizations, from Luke Nelson.

19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.

20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.

21) Add BPF iterators, from Yonghang Song.

22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.

23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.

25) Add CAP_BPF, from Alexei Starovoitov.

26) Support terse dumps in the packet scheduler, from Vlad Buslov.

27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

28) Add devm_register_netdev(), from Bartosz Golaszewski.

29) Minimize qdisc resets, from Cong Wang.

30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...

Linus Torvalds
2020-06-04 07:27:18 +0800

03 Jun, 2020

2 commits

94709049f Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:
"A few little subsystems and a start of a lot of MM patches.

Subsystems affected by this patch series: squashfs, ocfs2, parisc,
vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
swap, memcg, pagemap, memory-failure, vmalloc, kasan"

* emailed patches from Andrew Morton : (128 commits)
kasan: move kasan_report() into report.c
mm/mm_init.c: report kasan-tag information stored in page->flags
ubsan: entirely disable alignment checks under UBSAN_TRAP
kasan: fix clang compilation warning due to stack protector
x86/mm: remove vmalloc faulting
mm: remove vmalloc_sync_(un)mappings()
x86/mm/32: implement arch_sync_kernel_mappings()
x86/mm/64: implement arch_sync_kernel_mappings()
mm/ioremap: track which page-table levels were modified
mm/vmalloc: track which page-table levels were modified
mm: add functions to track page directory modifications
s390: use __vmalloc_node in stack_alloc
powerpc: use __vmalloc_node in alloc_vm_stack
arm64: use __vmalloc_node in arch_alloc_vmap_stack
mm: remove vmalloc_user_node_flags
mm: switch the test_vmalloc module to use __vmalloc_node
mm: remove __vmalloc_node_flags_caller
mm: remove both instances of __vmalloc_node_flags
mm: remove the prot argument to __vmalloc_node
mm: remove the pgprot argument to __vmalloc
...

Linus Torvalds
2020-06-03 03:21:36 +0800
8d92890bd mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead ... Browse Code »

After an NFS page has been written it is considered "unstable" until a
COMMIT request succeeds. If the COMMIT fails, the page will be
re-written.

These "unstable" pages are currently accounted as "reclaimable", either
in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a
'reclaimable' count. This might have made sense when sending the COMMIT
required a separate action by the VFS/MM (e.g. releasepage() used to
send a COMMIT). However now that all writes generated by ->writepages()
will automatically be followed by a COMMIT (since commit 919e3bd9a875
("NFS: Ensure we commit after writeback is complete")) it makes more
sense to treat them as writeback pages.

So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in
NR_WRITEBACK and WB_WRITEBACK.

A particular effect of this change is that when
wb_check_background_flush() calls wb_over_bg_threshold(), the latter
will report 'true' a lot less often as the 'unstable' pages are no
longer considered 'dirty' (as there is nothing that writeback can do
about them anyway).

Currently wb_check_background_flush() will trigger writeback to NFS even
when there are relatively few dirty pages (if there are lots of unstable
pages), this can result in small writes going to the server (10s of
Kilobytes rather than a Megabyte) which hurts throughput. With this
patch, there are fewer writes which are each larger on average.

Where the NR_UNSTABLE_NFS count was included in statistics
virtual-files, the entry is retained, but the value is hard-coded as
zero. static trace points and warning printks which mentioned this
counter no longer report it.

[akpm@linux-foundation.org: re-layout comment]
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: NeilBrown
Signed-off-by: Andrew Morton
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Acked-by: Trond Myklebust
Acked-by: Michal Hocko [mm]
Cc: Christoph Hellwig
Cc: Chuck Lever
Link: http://lkml.kernel.org/r/87d06j7gqa.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds

NeilBrown
2020-06-03 01:59:08 +0800

15 May, 2020

1 commit

628d06a48 scs: Add page accounting for shadow call stack allocations ... Browse Code »

This change adds accounting for the memory allocated for shadow stacks.

Signed-off-by: Sami Tolvanen
Reviewed-by: Kees Cook
Acked-by: Will Deacon
Signed-off-by: Will Deacon

Sami Tolvanen
2020-05-15 23:35:49 +0800

27 Apr, 2020

1 commit

32927393d sysctl: pass kernel pointers to ->proc_handler ... Browse Code »

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig
Acked-by: Andrey Ignatov
Signed-off-by: Al Viro

Christoph Hellwig
2020-04-27 14:07:40 +0800

08 Apr, 2020

2 commits

85b9f46e8 mm, thp: track fallbacks due to failed memcg charges separately ... Browse Code »

The thp_fault_fallback and thp_file_fallback vmstats are incremented if
either the hugepage allocation fails through the page allocator or the
hugepage charge fails through mem cgroup.

This patch leaves this field untouched but adds two new fields,
thp_{fault,file}_fallback_charge, which is incremented only when the mem
cgroup charge fails.

This distinguishes between attempted hugepage allocations that fail due to
fragmentation (or low memory conditions) and those that fail due to mem
cgroup limits. That can be used to determine the impact of fragmentation
on the system by excluding faults that failed due to memcg usage.

Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Reviewed-by: Yang Shi
Acked-by: Kirill A. Shutemov
Cc: Mike Rapoport
Cc: Jeremy Cline
Cc: Andrea Arcangeli
Cc: Mike Kravetz
Cc: Michal Hocko
Cc: Vlastimil Babka
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2003061422070.7412@chino.kir.corp.google.com
Signed-off-by: Linus Torvalds

David Rientjes
2020-04-08 01:43:38 +0800
dcdf11ee1 mm, shmem: add vmstat for hugepage fallback ... Browse Code »

The existing thp_fault_fallback indicates when thp attempts to allocate a
hugepage but fails, or if the hugepage cannot be charged to the mem cgroup
hierarchy.

Extend this to shmem as well. Adds a new thp_file_fallback to complement
thp_file_alloc that gets incremented when a hugepage is attempted to be
allocated but fails, or if it cannot be charged to the mem cgroup
hierarchy.

Additionally, remove the check for CONFIG_TRANSPARENT_HUGE_PAGECACHE from
shmem_alloc_hugepage() since it is only called with this configuration
option.

Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Reviewed-by: Yang Shi
Acked-by: Kirill A. Shutemov
Cc: Mike Rapoport
Cc: Jeremy Cline
Cc: Andrea Arcangeli
Cc: Mike Kravetz
Cc: Michal Hocko
Cc: Vlastimil Babka
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2003061421240.7412@chino.kir.corp.google.com
Signed-off-by: Linus Torvalds

David Rientjes
2020-04-08 01:43:38 +0800

03 Apr, 2020

1 commit

1970dc6f5 mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting ... Browse Code »

Now that pages are "DMA-pinned" via pin_user_page*(), and unpinned via
unpin_user_pages*(), we need some visibility into whether all of this is
working correctly.

Add two new fields to /proc/vmstat:

nr_foll_pin_acquired
nr_foll_pin_released

These are documented in Documentation/core-api/pin_user_pages.rst. They
represent the number of pages (since boot time) that have been pinned
("nr_foll_pin_acquired") and unpinned ("nr_foll_pin_released"), via
pin_user_pages*() and unpin_user_pages*().

In the absence of long-running DMA or RDMA operations that hold pages
pinned, the above two fields will normally be equal to each other.

Also: update Documentation/core-api/pin_user_pages.rst, to remove an
earlier (now confirmed untrue) claim about a performance problem with
/proc/vmstat.

Also: update Documentation/core-api/pin_user_pages.rst to rename the new
/proc/vmstat entries, to the names listed here.

Signed-off-by: John Hubbard
Signed-off-by: Andrew Morton
Reviewed-by: Jan Kara
Acked-by: Kirill A. Shutemov
Cc: Ira Weiny
Cc: Jérôme Glisse
Cc: "Matthew Wilcox (Oracle)"
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jason Gunthorpe
Cc: Jonathan Corbet
Cc: Michal Hocko
Cc: Mike Kravetz
Cc: Shuah Khan
Cc: Vlastimil Babka
Link: http://lkml.kernel.org/r/20200211001536.1027652-9-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds

John Hubbard
2020-04-03 00:35:27 +0800

05 Dec, 2019

2 commits

ebc5d83d0 mm/memcontrol: use vmstat names for printing statistics ... Browse Code »

Use common names from vmstat array when possible. This gives not much
difference in code size for now, but should help in keeping interfaces
consistent.

add/remove: 0/2 grow/shrink: 2/0 up/down: 70/-72 (-2)
Function old new delta
memory_stat_format 984 1050 +66
memcg_stat_show 957 961 +4
memcg1_event_names 32 - -32
mem_cgroup_lru_names 40 - -40
Total: Before=14485337, After=14485335, chg -0.00%

Link: http://lkml.kernel.org/r/157113012508.453.80391533767219371.stgit@buzz
Signed-off-by: Konstantin Khlebnikov
Acked-by: Andrew Morton
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2019-12-05 11:44:11 +0800
9d7ea9a29 mm/vmstat: add helpers to get vmstat item names for each enum type ... Browse Code »

Statistics in vmstat is combined from counters with different structure,
but names for them are merged into one array.

This patch adds trivial helpers to get name for each item:

const char *zone_stat_name(enum zone_stat_item item);
const char *numa_stat_name(enum numa_stat_item item);
const char *node_stat_name(enum node_stat_item item);
const char *writeback_stat_name(enum writeback_stat_item item);
const char *vm_event_name(enum vm_event_item item);

Names for enum writeback_stat_item are folded in the middle of
vmstat_text so this patch moves declaration into header to calculate
offset of following items.

Also this patch reuses piece of node stat names for lru list names:

const char *lru_list_name(enum lru_list lru);

This returns common lru list names: "inactive_anon", "active_anon",
"inactive_file", "active_file", "unevictable".

[khlebnikov@yandex-team.ru: do not use size of vmstat_text as count of /proc/vmstat items]
Link: http://lkml.kernel.org/r/157152151769.4139.15423465513138349343.stgit@buzz
Link: https://lore.kernel.org/linux-mm/cd1c42ae-281f-c8a8-70ac-1d01d417b2e1@infradead.org/T/#u
Link: http://lkml.kernel.org/r/157113012325.453.562783073839432766.stgit@buzz
Signed-off-by: Konstantin Khlebnikov
Reviewed-by: Andrew Morton
Cc: Randy Dunlap
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Johannes Weiner
Cc: YueHaibing
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2019-12-05 11:44:11 +0800

07 Nov, 2019

2 commits

93b3a6744 mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo ... Browse Code »

pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
This is not really nice because it blocks both any interrupts on that
cpu and the page allocator. On large machines this might even trigger
the hard lockup detector.

Considering the pagetypeinfo is a debugging tool we do not really need
exact numbers here. The primary reason to look at the outuput is to see
how pageblocks are spread among different migratetypes and low number of
pages is much more interesting therefore putting a bound on the number
of pages on the free_list sounds like a reasonable tradeoff.

The new output will simply tell
[...]
Node 6, zone Normal, type Movable >100000 >100000 >100000 >100000 41019 31560 23996 10054 3229 983 648

instead of
Node 6, zone Normal, type Movable 399568 294127 221558 102119 41019 31560 23996 10054 3229 983 648

The limit has been chosen arbitrary and it is a subject of a future
change should there be a need for that.

While we are at it, also drop the zone lock after each free_list
iteration which will help with the IRQ and page allocator responsiveness
even further as the IRQ lock held time is always bound to those 100k
pages.

[akpm@linux-foundation.org: tweak comment text, per David Hildenbrand]
Link: http://lkml.kernel.org/r/20191025072610.18526-3-mhocko@kernel.org
Signed-off-by: Michal Hocko
Suggested-by: Andrew Morton
Reviewed-by: Waiman Long
Acked-by: Vlastimil Babka
Acked-by: David Hildenbrand
Acked-by: Rafael Aquini
Acked-by: David Rientjes
Reviewed-by: Andrew Morton
Cc: Greg Kroah-Hartman
Cc: Jann Horn
Cc: Johannes Weiner
Cc: Konstantin Khlebnikov
Cc: Mel Gorman
Cc: Roman Gushchin
Cc: Song Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2019-11-07 00:47:50 +0800
abaed0112 mm, vmstat: hide /proc/pagetypeinfo from normal users ... Browse Code »

/proc/pagetypeinfo is a debugging tool to examine internal page
allocator state wrt to fragmentation. It is not very useful for any
other use so normal users really do not need to read this file.

Waiman Long has noticed that reading this file can have negative side
effects because zone->lock is necessary for gathering data and that a)
interferes with the page allocator and its users and b) can lead to hard
lockups on large machines which have very long free_list.

Reduce both issues by simply not exporting the file to regular users.

Link: http://lkml.kernel.org/r/20191025072610.18526-2-mhocko@kernel.org
Fixes: 467c996c1e19 ("Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo")
Signed-off-by: Michal Hocko
Reported-by: Waiman Long
Acked-by: Mel Gorman
Acked-by: Vlastimil Babka
Acked-by: Waiman Long
Acked-by: Rafael Aquini
Acked-by: David Rientjes
Reviewed-by: Andrew Morton
Cc: David Hildenbrand
Cc: Johannes Weiner
Cc: Roman Gushchin
Cc: Konstantin Khlebnikov
Cc: Jann Horn
Cc: Song Liu
Cc: Greg Kroah-Hartman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2019-11-07 00:47:50 +0800

25 Sep, 2019

1 commit

60fbf0ab5 mm,thp: stats for file backed THP ... Browse Code »

In preparation for non-shmem THP, this patch adds a few stats and exposes
them in /proc/meminfo, /sys/bus/node/devices//meminfo, and
/proc//task//smaps.

This patch is mostly a rewrite of Kirill A. Shutemov's earlier version:
https://lkml.kernel.org/r/20170126115819.58875-5-kirill.shutemov@linux.intel.com/

Link: http://lkml.kernel.org/r/20190801184244.3169074-5-songliubraving@fb.com
Signed-off-by: Song Liu
Acked-by: Rik van Riel
Acked-by: Kirill A. Shutemov
Acked-by: Johannes Weiner
Cc: Hillf Danton
Cc: Hugh Dickins
Cc: William Kucharski
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Song Liu
2019-09-25 06:54:11 +0800

21 May, 2019

1 commit

457c89965 treewide: Add SPDX license identifier for missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

20 Apr, 2019

1 commit

e8277b3b5 mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n ... Browse Code »

Commit 58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly")
depends on skipping vmstat entries with empty name introduced in
7aaf77272358 ("mm: don't show nr_indirectly_reclaimable in
/proc/vmstat") but reverted in b29940c1abd7 ("mm: rename and change
semantics of nr_indirectly_reclaimable_bytes").

So skipping no longer works and /proc/vmstat has misformatted lines " 0".

This patch simply shows debug counters "nr_tlb_remote_*" for UP.

Link: http://lkml.kernel.org/r/155481488468.467.4295519102880913454.stgit@buzz
Fixes: 58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly")
Signed-off-by: Konstantin Khlebnikov
Acked-by: Vlastimil Babka
Cc: Roman Gushchin
Cc: Jann Horn
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2019-04-20 00:46:04 +0800

06 Mar, 2019

1 commit

d9f7979c9 mm: no need to check return value of debugfs_create functions ... Browse Code »

When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

Link: http://lkml.kernel.org/r/20190122152151.16139-14-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman
Cc: Michal Hocko
Cc: Vlastimil Babka
Cc: David Rientjes
Cc: Laura Abbott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Kroah-Hartman
2019-03-06 13:07:17 +0800

29 Dec, 2018

1 commit

9705bea5f mm: convert zone->managed_pages to atomic variable ... Browse Code »

totalram_pages, zone->managed_pages and totalhigh_pages updates are
protected by managed_page_count_lock, but readers never care about it.
Convert these variables to atomic to avoid readers potentially seeing a
store tear.

This patch converts zone->managed_pages. Subsequent patches will convert
totalram_panges, totalhigh_pages and eventually managed_page_count_lock
will be removed.

Main motivation was that managed_page_count_lock handling was complicating
things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

Link: http://lkml.kernel.org/r/1542090790-21750-3-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS
Suggested-by: Michal Hocko
Suggested-by: Vlastimil Babka
Reviewed-by: Konstantin Khlebnikov
Reviewed-by: David Hildenbrand
Acked-by: Michal Hocko
Acked-by: Vlastimil Babka
Reviewed-by: Pavel Tatashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun KS
2018-12-29 04:11:47 +0800

19 Nov, 2018

1 commit

13c9aaf7f mm/vmstat.c: fix NUMA statistics updates ... Browse Code »

Scan through the whole array to see if an update is needed. While we're
at it, use sizeof() to be safe against any possible type changes in the
future.

The bug here is that we wouldn't sync per-cpu counters into global ones
if there was an update of numa_stats for higher cpus. Highly
theoretical one though because it is much more probable that zone_stats
are updated so we would refresh anyway. So I wouldn't bother to mark
this for stable, yet something nice to fix.

[mhocko@suse.com: changelog enhancement]
Link: http://lkml.kernel.org/r/1541601517-17282-1-git-send-email-janne.huttunen@nokia.com
Fixes: 1d90ca897cb0 ("mm: update NUMA counter threshold size")
Signed-off-by: Janne Huttunen
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Janne Huttunen
2018-11-19 02:15:10 +0800

27 Oct, 2018

2 commits

f0ecf25a0 mm/vmstat.c: assert that vmstat_text is in sync with stat_items_size ... Browse Code »

Having two gigantic arrays that must manually be kept in sync, including
ifdefs, isn't exactly robust. To make it easier to catch such issues in
the future, add a BUILD_BUG_ON().

Link: http://lkml.kernel.org/r/20181001143138.95119-3-jannh@google.com
Signed-off-by: Jann Horn
Reviewed-by: Kees Cook
Reviewed-by: Andrew Morton
Acked-by: Roman Gushchin
Acked-by: Michal Hocko
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Christoph Lameter
Cc: Kemi Wang
Cc: Andy Lutomirski
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jann Horn
2018-10-27 07:26:35 +0800
68d48e6a2 mm: workingset: add vmstat counter for shadow nodes ... Browse Code »

Make it easier to catch bugs in the shadow node shrinker by adding a
counter for the shadow nodes in circulation.

[akpm@linux-foundation.org: assert that irqs are disabled, for __inc_lruvec_page_state()]
[akpm@linux-foundation.org: s/WARN_ON_ONCE/VM_WARN_ON_ONCE/, per Johannes]
Link: http://lkml.kernel.org/r/20181009184732.762-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Reviewed-by: Andrew Morton
Acked-by: Peter Zijlstra (Intel)
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2018-10-27 07:26:33 +0800