06 Nov, 2015
40 commits
-
We have had trouble in the past from the way in which page migration's
newpage is initialized in dribs and drabs - see commit 8bdd63809160 ("mm:
fix direct reclaim writeback regression") which proposed a cleanup.We have no actual problem now, but I think the procedure would be clearer
(and alternative get_new_page pools safer to implement) if we assert that
newpage is not touched until we are sure that it's going to be used -
except for taking the trylock on it in __unmap_and_move().So shift the early initializations from move_to_new_page() into
migrate_page_move_mapping(), mapping and NULL-mapping paths. Similarly
migrate_huge_page_move_mapping(), but its NULL-mapping path can just be
deleted: you cannot reach hugetlbfs_migrate_page() with a NULL mapping.Adjust stages 3 to 8 in the Documentation file accordingly.
Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Hitherto page migration has avoided using a migration entry for a
swapcache page mapped into userspace, apparently for historical reasons.
So any page blessed with swapcache would entail a minor fault when it's
next touched, which page migration otherwise tries to avoid. Swapcache in
an mlocked area is rare, so won't often matter, but still better fixed.Just rearrange the block in try_to_unmap_one(), to handle TTU_MIGRATION
before checking PageAnon, that's all (apart from some reindenting).Well, no, that's not quite all: doesn't this by the way fix a soft_dirty
bug, that page migration of a file page was forgetting to transfer the
soft_dirty bit? Probably not a serious bug: if I understand correctly,
soft_dirty afficionados usually have to handle file pages separately
anyway; but we publish the bit in /proc//pagemap on file mappings as
well as anonymous, so page migration ought not to perturb it.Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Reviewed-by: Cyrill Gorcunov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__unmap_and_move() contains a long stale comment on page_get_anon_vma()
and PageSwapCache(), with an odd control flow that's hard to follow.
Mostly this reflects our confusion about the lifetime of an anon_vma, in
the early days of page migration, before we could take a reference to one.
Nowadays this seems quite straightforward: cut it all down to essentials.I cannot see the relevance of swapcache here at all, so don't treat it any
differently: I believe the old comment reflects in part our anon_vma
confusions, and in part the original v2.6.16 page migration technique,
which used actual swap to migrate anon instead of swap-like migration
entries. Why should a swapcache page not be migrated with the aid of
migration entry ptes like everything else? So lose that comment now, and
enable migration entries for swapcache in the next patch.Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Clean up page migration a little more by calling remove_migration_ptes()
from the same level, on success or on failure, from __unmap_and_move() or
from unmap_and_move_huge_page().Don't reset page->mapping of a PageAnon old page in move_to_new_page(),
leave that to when the page is freed. Except for here in page migration,
it has been an invariant that a PageAnon (bit set in page->mapping) page
stays PageAnon until it is freed, and I think we're safer to keep to that.And with the above rearrangement, it's necessary because zap_pte_range()
wants to identify whether a migration entry represents a file or an anon
page, to update the appropriate rss stats without waiting on it.Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Clean up page migration a little by moving the trylock of newpage from
move_to_new_page() into __unmap_and_move(), where the old page has been
locked. Adjust unmap_and_move_huge_page() and balloon_page_migrate()
accordingly.But make one kind-of-functional change on the way: whereas trylock of
newpage used to BUG() if it failed, now simply return -EAGAIN if so.
Cutting out BUG()s is good, right? But, to be honest, this is really to
extend the usefulness of the custom put_new_page feature, allowing a pool
of new pages to be shared perhaps with racing uses.Use an "else" instead of that "skip_unmap" label.
Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Acked-by: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I don't know of any problem from the way it's used in our current tree,
but there is one defect in page migration's custom put_new_page feature.An unused newpage is expected to be released with the put_new_page(), but
there was one MIGRATEPAGE_SUCCESS (0) path which released it with
putback_lru_page(): which can be very wrong for a custom pool.Fixed more easily by resetting put_new_page once it won't be needed, than
by adding a further flag to modify the rc test.Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Acked-by: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It's migrate.c not migration,c, and nowadays putback_movable_pages() not
putback_lru_pages().Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Acked-by: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
After v4.3's commit 0610c25daa3e ("memcg: fix dirty page migration")
mem_cgroup_migrate() doesn't have much to offer in page migration: convert
migrate_misplaced_transhuge_page() to set_page_memcg() instead.Then rename mem_cgroup_migrate() to mem_cgroup_replace_page(), since its
remaining callers are replace_page_cache_page() and shmem_replace_page():
both of whom passed lrucare true, so just eliminate that argument.Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit e6c509f85455 ("mm: use clear_page_mlock() in page_remove_rmap()")
in v3.7 inadvertently made mlock_migrate_page() impotent: page migration
unmaps the page from userspace before migrating, and that commit clears
PageMlocked on the final unmap, leaving mlock_migrate_page() with
nothing to do. Not a serious bug, the next attempt at reclaiming the
page would fix it up; but a betrayal of page migration's intent - the
new page ought to emerge as PageMlocked.I don't see how to fix it for mlock_migrate_page() itself; but easily
fixed in remove_migration_pte(), by calling mlock_vma_page() when the vma
is VM_LOCKED - under pte lock as in try_to_unmap_one().Delete mlock_migrate_page()? Not quite, it does still serve a purpose for
migrate_misplaced_transhuge_page(): where we could replace it by a test,
clear_page_mlock(), mlock_vma_page() sequence; but would that be an
improvement? mlock_migrate_page() is fairly lean, and let's make it
leaner by skipping the irq save/restore now clearly not needed.Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Acked-by: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
KernelThreadSanitizer (ktsan) has shown that the down_read_trylock() of
mmap_sem in try_to_unmap_one() (when going to set PageMlocked on a page
found mapped in a VM_LOCKED vma) is ineffective against races with
exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not held when
tearing down an mm.But that's okay, those races are benign; and although we've believed for
years in that ugly down_read_trylock(), it's unsuitable for the job, and
frustrates the good intention of setting PageMlocked when it fails.It just doesn't matter if here we read vm_flags an instant before or after
a racing mlock() or munlock() or exit_mmap() sets or clears VM_LOCKED: the
syscalls (or exit) work their way up the address space (taking pt locks
after updating vm_flags) to establish the final state.We do still need to be careful never to mark a page Mlocked (hence
unevictable) by any race that will not be corrected shortly after. The
page lock protects from many of the races, but not all (a page is not
necessarily locked when it's unmapped). But the pte lock we just dropped
is good to cover the rest (and serializes even with
munlock_vma_pages_all(), so no special barriers required): now hold on to
the pte lock while calling mlock_vma_page(). Is that lock ordering safe?
Yes, that's how follow_page_pte() calls it, and how page_remove_rmap()
calls the complementary clear_page_mlock().This fixes the following case (though not a case which anyone has
complained of), which mmap_sem did not: truncation's preliminary
unmap_mapping_range() is supposed to remove even the anonymous COWs of
filecache pages, and that might race with try_to_unmap_one() on a
VM_LOCKED vma, so that mlock_vma_page() sets PageMlocked just after
zap_pte_range() unmaps the page, causing "Bad page state (mlocked)" when
freed. The pte lock protects against this.You could say that it also protects against the more ordinary case, racing
with the preliminary unmapping of a filecache page itself: but in our
current tree, that's independently protected by i_mmap_rwsem; and that
race would be why "Bad page state (mlocked)" was seen before commit
48ec833b7851 ("Revert mm/memory.c: share the i_mmap_rwsem").Vlastimil Babka points out another race which this patch protects against.
try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
leaving PageMlocked and unevictable when it should be evictable. mmap_sem
is ineffective because exit_mmap() does not hold it; page lock ineffective
because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
is effective because __munlock_pagevec_fill() takes it to get the page,
after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.Kirill Shutemov points out that if the compiler chooses to implement a
"vma->vm_flags &= VM_WHATEVER" or "vma->vm_flags |= VM_WHATEVER" operation
with an intermediate store of unrelated bits set, since I'm here foregoing
its usual protection by mmap_sem, try_to_unmap_one() might catch sight of
a spurious VM_LOCKED in vm_flags, and make the wrong decision. This does
not appear to be an immediate problem, but we may want to define vm_flags
accessors in future, to guard against such a possibility.While we're here, make a related optimization in try_to_munmap_one(): if
it's doing TTU_MUNLOCK, then there's no point at all in descending the
page tables and getting the pt lock, unless the vma is VM_LOCKED. Yes,
that can change racily, but it can change racily even without the
optimization: it's not critical. Far better not to waste time here.Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
on this occasion, but that's probably the sensible next step - with a
rename, given that try_to_munlock()'s business is to try to set Mlocked.Updated the unevictable-lru Documentation, to remove its reference to mmap
semaphore, but found a few more updates needed in just that area.Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Acked-by: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
While updating some mm Documentation, I came across a few straggling
references to the non-linear vmas which were happily removed in v4.0.
Delete them.Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Acked-by: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: Oleg Nesterov
Cc: Sasha Levin
Cc: Dmitry Vyukov
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If ALLOC_SPLIT_PTLOCKS is defined, ptlock_init may fail, in which case we
shouldn't increment NR_PAGETABLE.Since small allocations, such as ptlock, normally do not fail (currently
they can fail if kmemcg is used though), this patch does not really fix
anything and should be considered as a code cleanup.Signed-off-by: Vladimir Davydov
Acked-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Don't build clear_soft_dirty_pmd() if transparent huge pages are not
enabled.Signed-off-by: Laurent Dufour
Reviewed-by: Aneesh Kumar K.V
Cc: Pavel Emelyanov
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As mentioned in the commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa()
for updating _PAGE_NUMA bit"), architectures like ppc64 don't do tlb
flush in set_pte/pmd functions.So when dealing with existing pte in clear_soft_dirty, the pte must be
cleared before being modified.Signed-off-by: Laurent Dufour
Reviewed-by: Aneesh Kumar K.V
Cc: Pavel Emelyanov
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
get_mergeable_page() can only return NULL (also in case of errors) or the
pinned mergeable page. It can't return an error different than NULL.
This optimizes away the unnecessary error check.Add a return after the "out:" label in the callee to make it more
readable.Signed-off-by: Andrea Arcangeli
Acked-by: Hugh Dickins
Cc: Petr Holasek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Doing the VM_MERGEABLE check after the page == kpage check won't provide
any meaningful benefit. The !vma->anon_vma check of find_mergeable_vma is
the only superfluous bit in using find_mergeable_vma because the !PageAnon
check of try_to_merge_one_page() implicitly checks for that, but it still
looks cleaner to share the same find_mergeable_vma().Signed-off-by: Andrea Arcangeli
Acked-by: Hugh Dickins
Cc: Petr Holasek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This just uses the helper function to cleanup the assumption on the
hlist_node internals.Signed-off-by: Andrea Arcangeli
Acked-by: Hugh Dickins
Cc: Petr Holasek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The stable_nodes can become stale at any time if the underlying pages gets
freed. The stable_node gets collected and removed from the stable rbtree
if that is detected during the rbtree lookups.Don't fail the lookup if running into stale stable_nodes, just restart the
lookup after collecting the stale stable_nodes. Otherwise the CPU spent
in the preparation stage is wasted and the lookup must be repeated at the
next loop potentially failing a second time in a second stale stable_node.If we don't prune aggressively we delay the merging of the unstable node
candidates and at the same time we delay the freeing of the stale
stable_nodes. Keeping stale stable_nodes around wastes memory and it
can't provide any benefit.Signed-off-by: Andrea Arcangeli
Acked-by: Hugh Dickins
Cc: Petr Holasek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
While at it add it to the file and anon walks too.
Signed-off-by: Andrea Arcangeli
Acked-by: Hugh Dickins
Cc: Petr Holasek
Acked-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Before the previous patch ("memcg: unify slab and other kmem pages
charging"), __mem_cgroup_from_kmem had to handle two types of kmem - slab
pages and pages allocated with alloc_kmem_pages - memcg in the page
struct. Now we can unify it. Since after it, this function becomes tiny
we can fold it into mem_cgroup_from_kmem.[hughd@google.com: move mem_cgroup_from_kmem into list_lru.c]
Signed-off-by: Vladimir Davydov
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have memcg_kmem_charge and memcg_kmem_uncharge methods for charging and
uncharging kmem pages to memcg, but currently they are not used for
charging slab pages (i.e. they are only used for charging pages allocated
with alloc_kmem_pages). The only reason why the slab subsystem uses
special helpers, memcg_charge_slab and memcg_uncharge_slab, is that it
needs to charge to the memcg of kmem cache while memcg_charge_kmem charges
to the memcg that the current task belongs to.To remove this diversity, this patch adds an extra argument to
__memcg_kmem_charge that can be a pointer to a memcg or NULL. If it is
not NULL, the function tries to charge to the memcg it points to,
otherwise it charge to the current context. Next, it makes the slab
subsystem use this function to charge slab pages.Since memcg_charge_kmem and memcg_uncharge_kmem helpers are now used only
in __memcg_kmem_charge and __memcg_kmem_uncharge, they are inlined. Since
__memcg_kmem_charge stores a pointer to the memcg in the page struct, we
don't need memcg_uncharge_slab anymore and can use free_kmem_pages.
Besides, one can now detect which memcg a slab page belongs to by reading
/proc/kpagecgroup.Note, this patch switches slab to charge-after-alloc design. Since this
design is already used for all other memcg charges, it should not make any
difference.[hannes@cmpxchg.org: better to have an outer function than a magic parameter for the memcg lookup]
Signed-off-by: Vladimir Davydov
Acked-by: Michal Hocko
Signed-off-by: Johannes Weiner
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Charging kmem pages proceeds in two steps. First, we try to charge the
allocation size to the memcg the current task belongs to, then we allocate
a page and "commit" the charge storing the pointer to the memcg in the
page struct.Such a design looks overcomplicated, because there is not much sense in
trying charging the allocation before actually allocating a page: we won't
be able to consume much memory over the limit even if we charge after
doing the actual allocation, besides we already charge user pages post
factum, so being pedantic with kmem pages just looks pointless.So this patch simplifies the design by merging the "charge" and the
"commit" steps into the same function, which takes the allocated page.Also, rename the charge and uncharge methods to memcg_kmem_charge and
memcg_kmem_uncharge and make the charge method return error code instead
of bool to conform to mem_cgroup_try_charge.Signed-off-by: Vladimir Davydov
Acked-by: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If kernelcore was not specified, or the kernelcore size is zero
(required_movablecore >= totalpages), or the kernelcore size is larger
than totalpages, there is no ZONE_MOVABLE. We should fill the zone with
both kernel memory and movable memory.Signed-off-by: Xishi Qiu
Reviewed-by: Yasuaki Ishimatsu
Cc: Mel Gorman
Cc: David Rientjes
Cc: Tang Chen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This function incurs in very hot paths and merely does a few loads for
validity check. Lets inline it, such that we can save the function call
overhead.(akpm: this is cosmetic - the compiler already inlines vmacache_valid_mm())
Signed-off-by: Davidlohr Bueso
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change HIGHMEM_ZONE to be the same as the DMA_ZONE macro.
Signed-off-by: yalin wang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Srinivas Kandagatla reported bad page messages when trying to remove the
bottom 2MB on an ARM based IFC6410 boardBUG: Bad page state in process swapper pfn:fffa8
page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x200041(locked|active|mlocked)
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
Hardware name: Qualcomm (Flattened Device Tree)
unwind_backtrace
show_stack
dump_stack
bad_page
free_pages_prepare
free_hot_cold_page
__free_pages
free_highmem_page
mem_init
start_kernel
Disabling lock debugging due to kernel taintRemoving the lower 2MB made the start of the lowmem zone to no longer be
page block aligned. IFC6410 uses CONFIG_FLATMEM where alloc_node_mem_map
allocates memory for the mem_map. alloc_node_mem_map will offset for
unaligned nodes with the assumption the pfn/page translation functions
will account for the offset. The functions for CONFIG_FLATMEM do not
offset however, resulting in overrunning the memmap array. Just use the
allocated memmap without any offset when running with CONFIG_FLATMEM to
avoid the overrun.Signed-off-by: Laura Abbott
Signed-off-by: Laura Abbott
Reported-by: Srinivas Kandagatla
Tested-by: Srinivas Kandagatla
Acked-by: Vlastimil Babka
Tested-by: Bjorn Andersson
Cc: Santosh Shilimkar
Cc: Russell King
Cc: Kevin Hilman
Cc: Arnd Bergman
Cc: Stephen Boyd
Cc: Andy Gross
Cc: Mel Gorman
Cc: Steven Rostedt
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With x86_64 (config http://ozlabs.org/~akpm/config-akpm2.txt) and old gcc
(4.4.4), drivers/base/node.c:node_read_meminfo() is using 2344 bytes of
stack. Uninlining node_page_state() reduces this to 440 bytes.The stack consumption issue is fixed by newer gcc (4.8.4) however with
that compiler this patch reduces the node.o text size from 7314 bytes to
4578.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make __install_special_mapping() args order match the caller, so the
caller can pass their register args directly to callee with no touch.For most of architectures, args (at least the first 5th args) are in
registers, so this change will have effect on most of architectures.For -O2, __install_special_mapping() may be inlined under most of
architectures, but for -Os, it should not. So this change can get a
little better performance for -Os, at least.Signed-off-by: Chen Gang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
(1) For !CONFIG_BUG cases, the bug call is a no-op, so we couldn't
care less and the change is ok.(2) ppc and mips, which HAVE_ARCH_BUG_ON, do not rely on branch
predictions as it seems to be pointless[1] and thus callers should not
be trying to push an optimization in the first place.(3) For CONFIG_BUG and !HAVE_ARCH_BUG_ON cases, BUG_ON() contains an
unlikely compiler flag already.Hence, we can drop unlikely behind BUG_ON().
[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.3/02289.html
Signed-off-by: Geliang Tang
Acked-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When fget() fails we can return -EBADF directly.
Signed-off-by: Chen Gang
Acked-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It is still a little better to remove it, although it should be skipped
by "-O2".Signed-off-by: Chen Gang =0A=
Acked-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This came up when implementing HIHGMEM/PAE40 for ARC. The kmap() /
kmap_atomic() generated code seemed needlessly bloated due to the way
PageHighMem() macro is implemented. It derives the exact zone for page
and then does pointer subtraction with first zone to infer the zone_type.
The pointer arithmatic in turn generates the code bloat.PageHighMem(page)
is_highmem(page_zone(page))
zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zonesInstead use is_highmem_idx() to work on zone_type available in page flags
----- Before -----
80756348: mov_s r13,r0
8075634a: ld_s r2,[r13,0]
8075634c: lsr_s r2,r2,30
8075634e: mpy r2,r2,0x2a4
80756352: add_s r2,r2,0x80aef880
80756358: ld_s r3,[r2,28]
8075635a: sub_s r2,r2,r3
8075635c: breq r2,0x2a4,80756378
80756364: breq r2,0x548,80756378----- After -----
80756330: mov_s r13,r0
80756332: ld_s r2,[r13,0]
80756334: lsr_s r2,r2,30
80756336: sub_s r2,r2,1
80756338: brlo r2,2,80756348For x86 defconfig build (32 bit only) it saves around 900 bytes.
For ARC defconfig with HIGHMEM, it saved around 2K bytes.---->8-------
./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
function old new delta
saveable_page 162 154 -8
saveable_highmem_page 154 146 -8
skb_gro_reset_offset 147 131 -16
...
...
__change_page_attr_set_clr 1715 1678 -37
setup_data_read 434 394 -40
mon_bin_event 1967 1927 -40
swsusp_save 1148 1105 -43
_set_pages_array 549 493 -56
---->8-------e.g. For ARC kmap()
Signed-off-by: Vineet Gupta
Acked-by: Michal Hocko
Cc: Naoya Horiguchi
Cc: Hugh Dickins
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Jennifer Herbert
Cc: Konstantin Khlebnikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Both "child->mm == mm" and "p->mm != mm" checks in oom_kill_process() are
wrong. task->mm can be NULL if the task is the exited group leader. This
means in particular that "kill sharing same memory" loop can miss a
process with a zombie leader which uses the same ->mm.Note: the process_has_mm(child, p->mm) check is still not 100% correct,
p->mm can be NULL too. This is minor, but probably deserves a fix or a
comment anyway.[akpm@linux-foundation.org: document process_shares_mm() a bit]
Signed-off-by: Oleg Nesterov
Acked-by: David Rientjes
Acked-by: Michal Hocko
Cc: Tetsuo Handa
Cc: Kyle Walker
Cc: Stanislav Kozina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Purely cosmetic, but the complex "if" condition looks annoying to me.
Especially because it is not consistent with OOM_SCORE_ADJ_MIN check
which adds another if/continue.Signed-off-by: Oleg Nesterov
Acked-by: David Rientjes
Acked-by: Michal Hocko
Acked-by: Hillf Danton
Cc: Tetsuo Handa
Cc: Kyle Walker
Cc: Stanislav Kozina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The fatal_signal_pending() was added to suppress unnecessary "sharing same
memory" message, but it can't 100% help anyway because it can be
false-negative; SIGKILL can be already dequeued.And worse, it can be false-positive due to exec or coredump. exec is
mostly fine, but coredump is not. It is possible that the group leader
has the pending SIGKILL because its sub-thread originated the coredump, in
this case we must not skip this process.We could probably add the additional ->group_exit_task check but this
patch just removes the wrong check along with pr_info().Signed-off-by: Oleg Nesterov
Acked-by: David Rientjes
Acked-by: Tetsuo Handa
Acked-by: Michal Hocko
Cc: Kyle Walker
Cc: Stanislav Kozina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cosmetic, but expand_upwards() and expand_downwards() overuse vma->vm_mm,
a local variable makes sense imho.Signed-off-by: Oleg Nesterov
Acked-by: Hugh Dickins
Cc: Andrey Konovalov
Cc: Davidlohr Bueso
Cc: "Kirill A. Shutemov"
Cc: Sasha Levin
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
"mm->locked_vm += grow" and vm_stat_account() in acct_stack_growth() are
not safe; multiple threads using the same ->mm can do this at the same
time trying to expans different vma's under down_read(mmap_sem). This
means that one of the "locked_vm += grow" changes can be lost and we can
miss munlock_vma_pages_all() later.Move this code into the caller(s) under mm->page_table_lock. All other
updates to ->locked_vm hold mmap_sem for writing.Signed-off-by: Oleg Nesterov
Acked-by: Hugh Dickins
Cc: Andrey Konovalov
Cc: Davidlohr Bueso
Cc: "Kirill A. Shutemov"
Cc: Sasha Levin
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the user set "movablecore=xx" to a large number, corepages will
overflow. Fix the problem.Signed-off-by: Xishi Qiu
Reviewed-by: Yasuaki Ishimatsu
Acked-by: Tang Chen
Acked-by: David Rientjes
Cc: Mel Gorman
Cc: Tang Chen
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In zone_reclaimable_pages(), `nr' is returned by a function which is
declared as returning "unsigned long", so declare it such. Negative
values are meaningless here.In zone_pagecache_reclaimable() we should also declare `delta' and
`nr_pagecache_reclaimable' as being unsigned longs because they're used to
store the values returned by zone_page_state() and
zone_unmapped_file_pages() which also happen to return unsigned integers.[akpm@linux-foundation.org: make zone_pagecache_reclaimable() return ulong rather than long]
Signed-off-by: Alexandru Moise
Acked-by: Michal Hocko
Cc: Vladimir Davydov
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The oom killer takes task_lock() in a couple of places solely to protect
printing the task's comm.A process's comm, including current's comm, may change due to
/proc/pid/comm or PR_SET_NAME.The comm will always be NULL-terminated, so the worst race scenario would
only be during update. We can tolerate a comm being printed that is in
the middle of an update to avoid taking the lock.Other locations in the kernel have already dropped task_lock() when
printing comm, so this is consistent.Signed-off-by: David Rientjes
Suggested-by: Oleg Nesterov
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Sergey Senozhatsky
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds