18 Jan, 2021
2 commits
-
This reverts commit 6f080d4eb47ff10163ebbd308d7bd0561f573a37.
This patch will break the gki requirement, revert this commit.
Change-Id: Ia287fe511f759e351b2d3331f5238c8ddef6ef54
-
Export some functions to support build trusty as modules.
Signed-off-by: Ji Luo
Change-Id: Iea7aac29a6ceedbee518df11bdbf1e461db23a33
26 Oct, 2020
1 commit
-
…nux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1
Resolves conflicts in:
fs/userfaultfd.cSigned-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af
25 Oct, 2020
1 commit
-
steps on the way to 5.10-rc1
Change-Id: Iddc84c25b6a9d71fa8542b927d6f69c364131c3d
Signed-off-by: Greg Kroah-Hartman
21 Oct, 2020
1 commit
-
…linux/kernel/git/arm64/linux") into android-mainline
Tiny steps on the way to 5.10-rc1.
Change-Id: I8ff6cb398ac1c0623bf2cefd29860616d05be107
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
17 Oct, 2020
1 commit
-
The page cache needs to know whether the filesystem supports THPs so that
it doesn't send THPs to filesystems which can't handle them. Dave Chinner
points out that getting from the page mapping to the filesystem type is
too many steps (mapping->host->i_sb->s_type->fs_flags) so cache that
information in the address space flags.Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Cc: "Matthew Wilcox (Oracle)"
Cc: Hugh Dickins
Cc: Song Liu
Cc: Rik van Riel
Cc: "Kirill A . Shutemov"
Cc: Johannes Weiner
Cc: Dave Chinner
Cc: Christoph Hellwig
Link: https://lkml.kernel.org/r/20200916032717.22917-1-willy@infradead.org
Signed-off-by: Linus Torvalds
14 Oct, 2020
1 commit
-
Convert shmem_getpage_gfp() (the only remaining caller of
find_lock_entry()) to cope with a head page being returned instead of
the subpage for the index.[willy@infradead.org: fix BUG()s]
Link https://lore.kernel.org/linux-mm/20200912032042.GA6583@casper.infradead.org/Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Alexey Dobriyan
Cc: Chris Wilson
Cc: Huang Ying
Cc: Hugh Dickins
Cc: Jani Nikula
Cc: Johannes Weiner
Cc: Matthew Auld
Cc: William Kucharski
Link: https://lkml.kernel.org/r/20200910183318.20139-8-willy@infradead.org
Signed-off-by: Linus Torvalds
13 Oct, 2020
1 commit
-
Pull arm64 updates from Will Deacon:
"There's quite a lot of code here, but much of it is due to the
addition of a new PMU driver as well as some arm64-specific selftests
which is an area where we've traditionally been lagging a bit.In terms of exciting features, this includes support for the Memory
Tagging Extension which narrowly missed 5.9, hopefully allowing
userspace to run with use-after-free detection in production on CPUs
that support it. Work is ongoing to integrate the feature with KASAN
for 5.11.Another change that I'm excited about (assuming they get the hardware
right) is preparing the ASID allocator for sharing the CPU page-table
with the SMMU. Those changes will also come in via Joerg with the
IOMMU pull.We do stray outside of our usual directories in a few places, mostly
due to core changes required by MTE. Although much of this has been
Acked, there were a couple of places where we unfortunately didn't get
any review feedback.Other than that, we ran into a handful of minor conflicts in -next,
but nothing that should post any issues.Summary:
- Userspace support for the Memory Tagging Extension introduced by
Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.- Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
switching.- Fix and subsequent rewrite of our Spectre mitigations, including
the addition of support for PR_SPEC_DISABLE_NOEXEC.- Support for the Armv8.3 Pointer Authentication enhancements.
- Support for ASID pinning, which is required when sharing
page-tables with the SMMU.- MM updates, including treating flush_tlb_fix_spurious_fault() as a
no-op.- Perf/PMU driver updates, including addition of the ARM CMN PMU
driver and also support to handle CPU PMU IRQs as NMIs.- Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.- Implementation of ARCH_STACKWALK for unwinding.
- Improve reporting of unexpected kernel traps due to BPF JIT
failure.- Improve robustness of user-visible HWCAP strings and their
corresponding numerical constants.- Removal of TEXT_OFFSET.
- Removal of some unused functions, parameters and prototypes.
- Removal of MPIDR-based topology detection in favour of firmware
description.- Cleanups to handling of SVE and FPSIMD register state in
preparation for potential future optimisation of handling across
syscalls.- Cleanups to the SDEI driver in preparation for support in KVM.
- Miscellaneous cleanups and refactoring work"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
Revert "arm64: initialize per-cpu offsets earlier"
arm64: random: Remove no longer needed prototypes
arm64: initialize per-cpu offsets earlier
kselftest/arm64: Check mte tagged user address in kernel
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Add utilities and a test to validate mte memory
perf: arm-cmn: Fix conversion specifiers for node type
perf: arm-cmn: Fix unsigned comparison to less than zero
arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
arm64: Get rid of arm64_ssbd_state
KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
KVM: arm64: Get rid of kvm_arm_have_ssbd()
KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
...
21 Sep, 2020
1 commit
-
Linux 5.9-rc6
Signed-off-by: Greg Kroah-Hartman
Change-Id: I3bccdbb773bfc2c604742e6ff5983bf0b61ba0b5
20 Sep, 2020
1 commit
-
Commit e809d5f0b5c9 ("tmpfs: per-superblock i_ino support") made changes
to shmem_reserve_inode() in mm/shmem.c, however the original test for
(sbinfo->max_inodes) got dropped. This causes mounting tmpfs with option
nr_inodes=0 to fail:# mount -ttmpfs -onr_inodes=0 none /ext0
mount: /ext0: mount(2) system call failed: Cannot allocate memory.This patch restores the nr_inodes=0 functionality.
Fixes: e809d5f0b5c9 ("tmpfs: per-superblock i_ino support")
Signed-off-by: Byron Stanoszek
Signed-off-by: Andrew Morton
Acked-by: Hugh Dickins
Acked-by: Chris Down
Link: https://lkml.kernel.org/r/20200902035715.16414-1-gandalf@winds.org
Signed-off-by: Linus Torvalds
04 Sep, 2020
2 commits
-
Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to
every physical page, when swapping pages out to disk it is necessary to
save these tags, and later restore them when reading the pages back.Add some hooks along with dummy implementations to enable the
arch code to handle this.Three new hooks are added to the swap code:
* arch_prepare_to_swap() and
* arch_swap_invalidate_page() / arch_swap_invalidate_area().
One new hook is added to shmem:
* arch_swap_restore()Signed-off-by: Steven Price
[catalin.marinas@arm.com: add unlock_page() on the error path]
[catalin.marinas@arm.com: dropped the _tags suffix]
Signed-off-by: Catalin Marinas
Acked-by: Andrew Morton -
Since arm64 memory (allocation) tags can only be stored in RAM, mapping
files with PROT_MTE is not allowed by default. RAM-based files like
those in a tmpfs mount or memfd_create() can support memory tagging, so
update the vm_flags accordingly in shmem_mmap().Signed-off-by: Catalin Marinas
Acked-by: Andrew Morton
13 Aug, 2020
3 commits
-
…ernel/git/abelloni/linux") into android-mainline
Steps on the way to 5.9-rc1.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iceded779988ff472863b7e1c54e22a9fa6383a30 -
Drop the repeated word "the".
Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: Zi Yan
Link: http://lkml.kernel.org/r/20200801173822.14973-11-rdunlap@infradead.org
Signed-off-by: Linus Torvalds -
Workingset detection for anonymous page will be implemented in the
following patch and it requires to store the shadow entries into the
swapcache. This patch implements an infrastructure to store the shadow
entry in the swapcache.Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Acked-by: Johannes Weiner
Cc: Hugh Dickins
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Vlastimil Babka
Link: http://lkml.kernel.org/r/1595490560-15117-5-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds
08 Aug, 2020
4 commits
-
…kernel/git/sre/linux-power-supply") into android-mainline
Merges along the way to 5.9-rc1
resolves conflicts in:
Documentation/ABI/testing/sysfs-class-power
drivers/power/supply/power_supply_sysfs.c
fs/crypto/inline_crypt.cSigned-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia087834f54fb4e5269d68c3c404747ceed240701 -
The current split between do_mmap() and do_mmap_pgoff() was introduced in
commit 1fcfd8db7f82 ("mm, mpx: add "vm_flags_t vm_flags" arg to
do_mmap_pgoff()") to support MPX.The wrapper function do_mmap_pgoff() always passed 0 as the value of the
vm_flags argument to do_mmap(). However, MPX support has subsequently
been removed from the kernel and there were no more direct callers of
do_mmap(); all calls were going via do_mmap_pgoff().Simplify the code by removing do_mmap_pgoff() and changing all callers to
directly call do_mmap(), which now no longer takes a vm_flags argument.Signed-off-by: Peter Collingbourne
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: David Hildenbrand
Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com
Signed-off-by: Linus Torvalds -
The default is still set to inode32 for backwards compatibility, but
system administrators can opt in to the new 64-bit inode numbers by
either:1. Passing inode64 on the command line when mounting, or
2. Configuring the kernel with CONFIG_TMPFS_INODE64=yThe inode64 and inode32 names are used based on existing precedent from
XFS.[hughd@google.com: Kconfig fixes]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008011928010.13320@eggly.anvilsSigned-off-by: Chris Down
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Reviewed-by: Amir Goldstein
Acked-by: Hugh Dickins
Cc: Al Viro
Cc: Matthew Wilcox
Cc: Jeff Layton
Cc: Johannes Weiner
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/8b23758d0c66b5e2263e08baf9c4b6a7565cbd8f.1594661218.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds -
Patch series "tmpfs: inode: Reduce risk of inum overflow", v7.
In Facebook production we are seeing heavy i_ino wraparounds on tmpfs. On
affected tiers, in excess of 10% of hosts show multiple files with
different content and the same inode number, with some servers even having
as many as 150 duplicated inode numbers with differing file content.This causes actual, tangible problems in production. For example, we have
complaints from those working on remote caches that their application is
reporting cache corruptions because it uses (device, inodenum) to
establish the identity of a particular cache object, but because it's not
unique any more, the application refuses to continue and reports cache
corruption. Even worse, sometimes applications may not even detect the
corruption but may continue anyway, causing phantom and hard to debug
behaviour.In general, userspace applications expect that (device, inodenum) should
be enough to be uniquely point to one inode, which seems fair enough. One
might also need to check the generation, but in this case:1. That's not currently exposed to userspace
(ioctl(...FS_IOC_GETVERSION...) returns ENOTTY on tmpfs);
2. Even with generation, there shouldn't be two live inodes with the
same inode number on one device.In order to mitigate this, we take a two-pronged approach:
1. Moving inum generation from being global to per-sb for tmpfs. This
itself allows some reduction in i_ino churn. This works on both 64-
and 32- bit machines.
2. Adding inode{64,32} for tmpfs. This fix is supported on machines with
64-bit ino_t only: we allow users to mount tmpfs with a new inode64
option that uses the full width of ino_t, or CONFIG_TMPFS_INODE64.You can see how this compares to previous related patches which didn't
implement this per-superblock:- https://patchwork.kernel.org/patch/11254001/
- https://patchwork.kernel.org/patch/11023915/This patch (of 2):
get_next_ino has a number of problems:
- It uses and returns a uint, which is susceptible to become overflowed
if a lot of volatile inodes that use get_next_ino are created.
- It's global, with no specificity per-sb or even per-filesystem. This
means it's not that difficult to cause inode number wraparounds on a
single device, which can result in having multiple distinct inodes
with the same inode number.This patch adds a per-superblock counter that mitigates the second case.
This design also allows us to later have a specific i_ino size per-device,
for example, allowing users to choose whether to use 32- or 64-bit inodes
for each tmpfs mount. This is implemented in the next commit.For internal shmem mounts which may be less tolerant to spinlock delays,
we implement a percpu batching scheme which only takes the stat_lock at
each batch boundary.Signed-off-by: Chris Down
Signed-off-by: Andrew Morton
Acked-by: Hugh Dickins
Cc: Amir Goldstein
Cc: Al Viro
Cc: Matthew Wilcox
Cc: Jeff Layton
Cc: Johannes Weiner
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/cover.1594661218.git.chris@chrisdown.name
Link: http://lkml.kernel.org/r/1986b9d63b986f08ec07a4aa4b2275e718e47d8a.1594661218.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds
25 Jul, 2020
2 commits
-
Partial 5.8-rc7 merge to make the final merge easier.
Signed-off-by: Greg Kroah-Hartman
Change-Id: I95f1b0a379e3810333300a70c5a93f449d945c54 -
After commit fdc85222d58e ("kernfs: kvmalloc xattr value instead of
kmalloc"), simple xattr entry is allocated with kvmalloc() instead of
kmalloc(), so we should release it with kvfree() instead of kfree().Fixes: fdc85222d58e ("kernfs: kvmalloc xattr value instead of kmalloc")
Signed-off-by: Chengguang Xu
Signed-off-by: Andrew Morton
Acked-by: Hugh Dickins
Acked-by: Tejun Heo
Cc: Daniel Xu
Cc: Chris Down
Cc: Andreas Dilger
Cc: Greg Kroah-Hartman
Cc: Al Viro
Cc: [5.7]
Link: http://lkml.kernel.org/r/20200704051608.15043-1-cgxu519@mykernel.net
Signed-off-by: Linus Torvalds
24 Jun, 2020
1 commit
-
…cm/linux/kernel/git/linkinjeon/exfat") into android-mainline
Steps on the way to 5.8-rc1.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4bc42f572167ea2f815688b4d1eb6124b6d260d4
22 Jun, 2020
1 commit
-
Steps along the way to 5.8-rc1.
Signed-off-by: Greg Kroah-Hartman
Change-Id: I6cca4fa48322228c8182201d68dc05f9b72cfc50
10 Jun, 2020
2 commits
-
Convert comments that reference mmap_sem to reference mmap_lock instead.
[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Daniel Jordan
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds -
Patch series "mm: consolidate definitions of page table accessors", v2.
The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.Most of these definitions are actually identical and typically it boils
down to, e.g.static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.This patch (of 12):
The linux/mm.h header includes to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include
in the files that include .The include statements in such cases are remove with a simple loop:
for f in $(git grep -l "include ") ; do
sed -i -e '/include / d' $f
doneSigned-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Cc: Arnd Bergmann
Cc: Borislav Petkov
Cc: Brian Cain
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Ungerer
Cc: Guan Xuetao
Cc: Guo Ren
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Ingo Molnar
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Matthew Wilcox
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Mike Rapoport
Cc: Nick Hu
Cc: Paul Walmsley
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Stafford Horne
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vincent Chen
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds
04 Jun, 2020
7 commits
-
They're the same function, and for the purpose of all callers they are
equivalent to lru_cache_add().[akpm@linux-foundation.org: fix it for local_lock changes]
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Reviewed-by: Rik van Riel
Acked-by: Michal Hocko
Acked-by: Minchan Kim
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200520232525.798933-5-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds -
Swapin faults were the last event to charge pages after they had already
been put on the LRU list. Now that we charge directly on swapin, the
lrucare portion of the charge code is unused.Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Reviewed-by: Joonsoo Kim
Cc: Alex Shi
Cc: Hugh Dickins
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Roman Gushchin
Cc: Balbir Singh
Cc: Shakeel Butt
Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds -
Right now, users that are otherwise memory controlled can easily escape
their containment and allocate significant amounts of memory that they're
not being charged for. That's because swap readahead pages are not being
charged until somebody actually faults them into their page table. This
can be exploited with MADV_WILLNEED, which triggers arbitrary readahead
allocations without charging the pages.There are additional problems with the delayed charging of swap pages:
1. To implement refault/workingset detection for anonymous pages, we
need to have a target LRU available at swapin time, but the LRU is not
determinable until the page has been charged.2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be
stable when the page is isolated from the LRU; otherwise, the locks
change under us. But swapcache gets charged after it's already on the
LRU, and even if we cannot isolate it ourselves (since charging is not
exactly optional).The previous patch ensured we always maintain cgroup ownership records for
swap pages. This patch moves the swapcache charging point from the fault
handler to swapin time to fix all of the above problems.v2: simplify swapin error checking (Joonsoo)
[hughd@google.com: fix livelock in __read_swap_cache_async()]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005212246080.8458@eggly.anvils
Signed-off-by: Johannes Weiner
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Reviewed-by: Alex Shi
Cc: Hugh Dickins
Cc: Joonsoo Kim
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Roman Gushchin
Cc: Shakeel Butt
Cc: Balbir Singh
Cc: Rafael Aquini
Cc: Alex Shi
Link: http://lkml.kernel.org/r/20200508183105.225460-17-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds -
Memcg maintains private MEMCG_CACHE and NR_SHMEM counters. This
divergence from the generic VM accounting means unnecessary code overhead,
and creates a dependency for memcg that page->mapping is set up at the
time of charging, so that page types can be told apart.Convert the generic accounting sites to mod_lruvec_page_state and friends
to maintain the per-cgroup vmstat counters of NR_FILE_PAGES and NR_SHMEM.
The page is already locked in these places, so page->mem_cgroup is stable;
we only need minimal tweaks of two mem_cgroup_migrate() calls to ensure
it's set up in time.Then replace MEMCG_CACHE with NR_FILE_PAGES and delete the private
NR_SHMEM accounting sites.Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Reviewed-by: Joonsoo Kim
Cc: Alex Shi
Cc: Hugh Dickins
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Roman Gushchin
Cc: Shakeel Butt
Cc: Balbir Singh
Link: http://lkml.kernel.org/r/20200508183105.225460-10-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds -
The try/commit/cancel protocol that memcg uses dates back to when pages
used to be uncharged upon removal from the page cache, and thus couldn't
be committed before the insertion had succeeded. Nowadays, pages are
uncharged when they are physically freed; it doesn't matter whether the
insertion was successful or not. For the page cache, the transaction
dance has become unnecessary.Introduce a mem_cgroup_charge() function that simply charges a newly
allocated page to a cgroup and sets up page->mem_cgroup in one single
step. If the insertion fails, the caller doesn't have to do anything but
free/put the page.Then switch the page cache over to this new API.
Subsequent patches will also convert anon pages, but it needs a bit more
prep work. Right now, memcg depends on page->mapping being already set up
at the time of charging, so that it can maintain its own MEMCG_CACHE and
MEMCG_RSS counters. For anon, page->mapping is set under the same pte
lock under which the page is publishd, so a single charge point that can
block doesn't work there just yet.The following prep patches will replace the private memcg counters with
the generic vmstat counters, thus removing the page->mapping dependency,
then complete the transition to the new single-point charge API and delete
the old transactional scheme.v2: leave shmem swapcache when charging fails to avoid double IO (Joonsoo)
v3: rebase on preceeding shmem simplification patchSigned-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Reviewed-by: Alex Shi
Cc: Hugh Dickins
Cc: Joonsoo Kim
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Roman Gushchin
Cc: Shakeel Butt
Cc: Balbir Singh
Link: http://lkml.kernel.org/r/20200508183105.225460-6-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds -
Commit 215c02bc33bb ("tmpfs: fix shmem_getpage_gfp() VM_BUG_ON")
recognized that hole punching can race with swapin and removed the
BUG_ON() for a truncated entry from the swapin path.The patch also added a swapcache deletion to optimize this rare case:
Since swapin has the page locked, and free_swap_and_cache() merely
trylocks, this situation can leave the page stranded in swapcache.
Usually, page reclaim picks up stale swapcache pages, and the race can
happen at any other time when the page is locked. (The same happens for
non-shmem swapin racing with page table zapping.) The thinking here was:
we already observed the race and we have the page locked, we may as well
do the cleanup instead of waiting for reclaim.However, this optimization complicates the next patch which moves the
cgroup charging code around. As this is just a minor speedup for a race
condition that is so rare that it required a fuzzer to trigger the
original BUG_ON(), it's no longer worth the complications.Suggested-by: Hugh Dickins
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Acked-by: Hugh Dickins
Cc: Alex Shi
Cc: Joonsoo Kim
Cc: Shakeel Butt
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Roman Gushchin
Cc: Balbir Singh
Link: http://lkml.kernel.org/r/20200511181056.GA339505@cmpxchg.org
Signed-off-by: Linus Torvalds -
The memcg charging API carries a boolean @compound parameter that tells
whether the page we're dealing with is a hugepage.
mem_cgroup_commit_charge() has another boolean @lrucare that indicates
whether the page needs LRU locking or not while charging. The majority of
callsites know those parameters at compile time, which results in a lot of
naked "false, false" argument lists. This makes for cryptic code and is a
breeding ground for subtle mistakes.Thankfully, the huge page state can be inferred from the page itself and
doesn't need to be passed along. This is safe because charging completes
before the page is published and somebody may split it.Simplify the callsites by removing @compound, and let memcg infer the
state by using hpage_nr_pages() unconditionally. That function does
PageTransHuge() to identify huge pages, which also helpfully asserts that
nobody passes in tail pages by accident.The following patches will introduce a new charging API, best not to carry
over unnecessary weight.Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Reviewed-by: Alex Shi
Reviewed-by: Joonsoo Kim
Reviewed-by: Shakeel Butt
Cc: Hugh Dickins
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Roman Gushchin
Cc: Balbir Singh
Link: http://lkml.kernel.org/r/20200508183105.225460-4-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds
27 Apr, 2020
1 commit
-
Linux 5.7-rc3
Signed-off-by: Greg Kroah-Hartman
Change-Id: I7ac2896e866a898b1f42fc8fd6d60b40a75ae1f0
22 Apr, 2020
3 commits
-
Syzbot reported the below lockdep splat:
WARNING: possible irq lock inversion dependency detected
5.6.0-rc7-syzkaller #0 Not tainted
--------------------------------------------------------
syz-executor.0/10317 just changed the state of lock:
ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: shmem_mfill_atomic_pte+0x1012/0x21c0 mm/shmem.c:2407
but this lock was taken by another, SOFTIRQ-safe lock in the past:
(&(&xa->xa_lock)->rlock#5){..-.}and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:CPU0 CPU1
---- ----
lock(&(&info->lock)->rlock);
local_irq_disable();
lock(&(&xa->xa_lock)->rlock#5);
lock(&(&info->lock)->rlock);
lock(&(&xa->xa_lock)->rlock#5);*** DEADLOCK ***
The full report is quite lengthy, please see:
https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2004152007370.13597@eggly.anvils/T/#m813b412c5f78e25ca8c6c7734886ed4de43f241d
It is because CPU 0 held info->lock with IRQ enabled in userfaultfd_copy
path, then CPU 1 is splitting a THP which held xa_lock and info->lock in
IRQ disabled context at the same time. If softirq comes in to acquire
xa_lock, the deadlock would be triggered.The fix is to acquire/release info->lock with *_irq version instead of
plain spin_{lock,unlock} to make it softirq safe.Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Reported-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com
Signed-off-by: Yang Shi
Signed-off-by: Andrew Morton
Tested-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com
Acked-by: Hugh Dickins
Cc: Andrea Arcangeli
Link: http://lkml.kernel.org/r/1587061357-122619-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Linus Torvalds -
Recent commit 71725ed10c40 ("mm: huge tmpfs: try to split_huge_page()
when punching hole") has allowed syzkaller to probe deeper, uncovering a
long-standing lockdep issue between the irq-unsafe shmlock_user_lock,
the irq-safe xa_lock on mapping->i_pages, and shmem inode's info->lock
which nests inside xa_lock (or tree_lock) since 4.8's shmem_uncharge().user_shm_lock(), servicing SysV shmctl(SHM_LOCK), wants
shmlock_user_lock while its caller shmem_lock() holds info->lock with
interrupts disabled; but hugetlbfs_file_setup() calls user_shm_lock()
with interrupts enabled, and might be interrupted by a writeback endio
wanting xa_lock on i_pages.This may not risk an actual deadlock, since shmem inodes do not take
part in writeback accounting, but there are several easy ways to avoid
it.Requiring interrupts disabled for shmlock_user_lock would be easy, but
it's a high-level global lock for which that seems inappropriate.
Instead, recall that the use of info->lock to guard info->flags in
shmem_lock() dates from pre-3.1 days, when races with SHMEM_PAGEIN and
SHMEM_TRUNCATE could occur: nowadays it serves no purpose, the only flag
added or removed is VM_LOCKED itself, and calls to shmem_lock() an inode
are already serialized by the caller.Take info->lock out of the chain and the possibility of deadlock or
lockdep warning goes away.Fixes: 4595ef88d136 ("shmem: make shmem_inode_info::lock irq-safe")
Reported-by: syzbot+c8a8197c8852f566b9d9@syzkaller.appspotmail.com
Reported-by: syzbot+40b71e145e73f78f81ad@syzkaller.appspotmail.com
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Acked-by: Yang Shi
Cc: Yang Shi
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2004161707410.16322@eggly.anvils
Link: https://lore.kernel.org/lkml/000000000000e5838c05a3152f53@google.com/
Link: https://lore.kernel.org/lkml/0000000000003712b305a331d3b1@google.com/
Signed-off-by: Linus Torvalds -
Some optimizers don't notice that shmem_punch_compound() is always true
(PageTransCompound() being false) without CONFIG_TRANSPARENT_HUGEPAGE==y.Use IS_ENABLED to help them to avoid the BUILD_BUG inside HPAGE_PMD_NR.
Fixes: 71725ed10c40 ("mm: huge tmpfs: try to split_huge_page() when punching hole")
Reported-by: Randy Dunlap
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Tested-by: Randy Dunlap
Acked-by: Randy Dunlap
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2004142339170.10035@eggly.anvils
Signed-off-by: Linus Torvalds
10 Apr, 2020
1 commit
-
…x") into android-mainline
Baby steps on the way to 5.7-rc1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I89095a90046a14eab189aab257a75b3dfdb5b1db
08 Apr, 2020
3 commits
-
…ux/kernel/git/vgupta/arc") into android-mainline
Steps along the 5.7-rc1 merge.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib9f87147ac3d81985496818b0c61bdd086140eed -
Convert the various /* fallthrough */ comments to the pseudo-keyword
fallthrough;Done via script:
https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Reviewed-by: Gustavo A. R. Silva
Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.com
Signed-off-by: Linus Torvalds -
Yang Shi writes:
Currently, when truncating a shmem file, if the range is partly in a THP
(start or end is in the middle of THP), the pages actually will just get
cleared rather than being freed, unless the range covers the whole THP.
Even though all the subpages are truncated (randomly or sequentially), the
THP may still be kept in page cache.This might be fine for some usecases which prefer preserving THP, but
balloon inflation is handled in base page size. So when using shmem THP
as memory backend, QEMU inflation actually doesn't work as expected since
it doesn't free memory. But the inflation usecase really needs to get the
memory freed. (Anonymous THP will also not get freed right away, but will
be freed eventually when all subpages are unmapped: whereas shmem THP
still stays in page cache.)Split THP right away when doing partial hole punch, and if split fails
just clear the page so that read of the punched area will return zeroes.Hugh Dickins adds:
Our earlier "team of pages" huge tmpfs implementation worked in the way
that Yang Shi proposes; and we have been using this patch to continue to
split the huge page when hole-punched or truncated, since converting over
to the compound page implementation. Although huge tmpfs gives out huge
pages when available, if the user specifically asks to truncate or punch a
hole (perhaps to free memory, perhaps to reduce the memcg charge), then
the filesystem should do so as best it can, splitting the huge page.That is not always possible: any additional reference to the huge page
prevents split_huge_page() from succeeding, so the result can be flaky.
But in practice it works successfully enough that we've not seen any
problem from that.Add shmem_punch_compound() to encapsulate the decision of when a split is
needed, and doing the split if so. Using this simplifies the flow in
shmem_undo_range(); and the first (trylock) pass does not need to do any
page clearing on failure, because the second pass will either succeed or
do that clearing. Following the example of zero_user_segment() when
clearing a partial page, add flush_dcache_page() and set_page_dirty() when
clearing a hole - though I'm not certain that either is needed.But: split_huge_page() would be sure to fail if shmem_undo_range()'s
pagevec holds further references to the huge page. The easiest way to fix
that is for find_get_entries() to return early, as soon as it has put one
compound head or tail into the pagevec. At first this felt like a hack;
but on examination, this convention better suits all its callers - or will
do, if the slight one-page-per-pagevec slowdown in shmem_unlock_mapping()
and shmem_seek_hole_data() is transformed into a 512-page-per-pagevec
speedup by checking for compound pages there.Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Cc: Yang Shi
Cc: Alexander Duyck
Cc: "Michael S. Tsirkin"
Cc: David Hildenbrand
Cc: "Kirill A. Shutemov"
Cc: Matthew Wilcox
Cc: Andrea Arcangeli
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2002261959020.10801@eggly.anvils
Signed-off-by: Linus Torvalds