25 Feb, 2017
40 commits
-
There is which provides macros for various gcc
specific constructs. Eg: __weak for __attribute__((weak)). I've
cleaned all instances of gcc specific attributes with the right macros
for all files under /arch/m68kLink: http://lkml.kernel.org/r/1485540901-1988-3-git-send-email-gidisrael@gmail.com
Signed-off-by: Gideon Israel Dsouza
Cc: Greg Ungerer
Cc: Geert Uytterhoeven
Cc: Paul Gortmaker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add __mode(x) into compiler-gcc.h as part of a cleanup task I've taken
up, to replace gcc specific attributes with macros.The next patch is a cleanup of the m68k subsystem and it requires a new
macro to wrap __attribute__ ((mode (...)))Link: http://lkml.kernel.org/r/1485540901-1988-2-git-send-email-gidisrael@gmail.com
Signed-off-by: Gideon Israel Dsouza
Cc: Greg Ungerer
Cc: Geert Uytterhoeven
Cc: Paul Gortmaker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The timer APIs this header needs are ktime_get(), ktime_add_us(), and
ktime_compare(). So, including seems enough. This
commit will cut unnecessary header file parsing.Link: http://lkml.kernel.org/r/1481679225-10885-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 63159f5dcccb ("uapi: Use __kernel_long_t in struct mq_attr")
changed the types from long to __kernel_long_t, but didn't add a
linux/types.h include. Code that tries to include this header directly
breaks:/usr/include/linux/mqueue.h:26:2: error: unknown type name '__kernel_long_t'
__kernel_long_t mq_flags; /* message queue flags */This also upsets configure tests for this header:
checking linux/mqueue.h usability... no
checking linux/mqueue.h presence... yes
configure: WARNING: linux/mqueue.h: present but cannot be compiled
configure: WARNING: linux/mqueue.h: check for missing prerequisite headers?
configure: WARNING: linux/mqueue.h: see the Autoconf documentation
configure: WARNING: linux/mqueue.h: section "Present But Cannot Be Compiled"
configure: WARNING: linux/mqueue.h: proceeding with the compiler's result
checking for linux/mqueue.h... noLink: http://lkml.kernel.org/r/20170119194644.4403-1-vapier@gentoo.org
Signed-off-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Previously, the hidepid parameter was checked by comparing literal
integers 0, 1, 2. Let's add a proper enum for this, to make the
checking more expressive:0 → HIDEPID_OFF
1 → HIDEPID_NO_ACCESS
2 → HIDEPID_INVISIBLEThis changes the internal labelling only, the userspace-facing interface
remains unmodified, and still works with literal integers 0, 1, 2.No functional changes.
Link: http://lkml.kernel.org/r/1484572984-13388-2-git-send-email-djalal@gmail.com
Signed-off-by: Lafcadio Wluiki
Signed-off-by: Djalal Harouni
Acked-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
After staring at this code for a while I've figured using small 2-entry
array describing ARGV and ENVP is the way to address code duplication
critique.Link: http://lkml.kernel.org/r/20170105185724.GA12027@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.Link: http://lkml.kernel.org/r/4fd1f82818665705ce75c5156a060ae7caa8e0a9.1482160150.git.geliangtang@gmail.com
Signed-off-by: Geliang Tang
Cc: Jan Kara
Cc: Al Viro
Cc: "David S. Miller"
Cc: Juergen Gross
Cc: Dmitry Torokhov
Cc: Seth Forshee
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Given that the arch does not add its own implementations, simply use the
asm-generic/current.h (generic-y) header instead of duplicating code.Link: http://lkml.kernel.org/r/1485992878-4780-2-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The build of frv defconfig gives warning:
arch/frv/mb93090-mb00/pci-frv.c:176:5: warning: ignoring return value of 'pci_assign_resource', declared with attribute warn_unused_result
Just print an error message to silence the warning. We can not do much
here on error.Link: http://lkml.kernel.org/r/1484256471-5379-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make a kasan test which uses a SLAB_ACCOUNT slab cache. If the test is
run within a non default memcg, then it uncovers the bug fixed by
"kasan: drain quarantine of memcg slab objects"[1].If run without fix [1] it shows "Slab cache still has objects", and the
kmem_cache structure is leaked.
Here's an unpatched kernel test:$ dmesg -c > /dev/null
$ mkdir /sys/fs/cgroup/memory/test
$ echo $$ > /sys/fs/cgroup/memory/test/tasks
$ modprobe test_kasan 2> /dev/null
$ dmesg | grep -B1 still
[ 123.456789] kasan test: memcg_accounted_kmem_cache allocate memcg accounted object
[ 124.456789] kmem_cache_destroy test_cache: Slab cache still has objectsKernels with fix [1] don't have the "Slab cache still has objects"
warning or the underlying leak.The new test runs and passes in the default (root) memcg, though in the
root memcg it won't uncover the problem fixed by [1].Link: http://lkml.kernel.org/r/1482257462-36948-2-git-send-email-gthelen@google.com
Signed-off-by: Greg Thelen
Reviewed-by: Vladimir Davydov
Cc: Andrey Ryabinin
Cc: Alexander Potapenko
Cc: Dmitry Vyukov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Per memcg slab accounting and kasan have a problem with kmem_cache
destruction.
- kmem_cache_create() allocates a kmem_cache, which is used for
allocations from processes running in root (top) memcg.
- Processes running in non root memcg and allocating with either
__GFP_ACCOUNT or from a SLAB_ACCOUNT cache use a per memcg
kmem_cache.
- Kasan catches use-after-free by having kfree() and kmem_cache_free()
defer freeing of objects. Objects are placed in a quarantine.
- kmem_cache_destroy() destroys root and non root kmem_caches. It takes
care to drain the quarantine of objects from the root memcg's
kmem_cache, but ignores objects associated with non root memcg. This
causes leaks because quarantined per memcg objects refer to per memcg
kmem cache being destroyed.To see the problem:
1) create a slab cache with kmem_cache_create(,,,SLAB_ACCOUNT,)
2) from non root memcg, allocate and free a few objects from cache
3) dispose of the cache with kmem_cache_destroy() kmem_cache_destroy()
will trigger a "Slab cache still has objects" warning indicating
that the per memcg kmem_cache structure was leaked.Fix the leak by draining kasan quarantined objects allocated from non
root memcg.Racing memcg deletion is tricky, but handled. kmem_cache_destroy() =>
shutdown_memcg_caches() => __shutdown_memcg_cache() => shutdown_cache()
flushes per memcg quarantined objects, even if that memcg has been
rmdir'd and gone through memcg_deactivate_kmem_caches().This leak only affects destroyed SLAB_ACCOUNT kmem caches when kasan is
enabled. So I don't think it's worth patching stable kernels.Link: http://lkml.kernel.org/r/1482257462-36948-1-git-send-email-gthelen@google.com
Signed-off-by: Greg Thelen
Reviewed-by: Vladimir Davydov
Acked-by: Andrey Ryabinin
Cc: Alexander Potapenko
Cc: Dmitry Vyukov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 31bc3858ea3e ("add automatic onlining policy for the newly added
memory") provides the capability to have added memory automatically
onlined during add, but this appears to be slightly broken.The current implementation uses walk_memory_range() to call
online_memory_block, which uses memory_block_change_state() to online
the memory. Instead, we should be calling device_online() for the
memory block in online_memory_block(). This would online the memory
(the memory bus online routine memory_subsys_online() called from
device_online calls memory_block_change_state()) and properly update the
device struct offline flag.As a result of the current implementation, attempting to remove a memory
block after adding it using auto online fails. This is because doing a
remove, for instanceecho offline > /sys/devices/system/memory/memoryXXX/state
uses device_offline() which checks the dev->offline flag.
Link: http://lkml.kernel.org/r/20170222220744.8119.19687.stgit@ltcalpine2-lp14.aus.stglabs.ibm.com
Signed-off-by: Nathan Fontenot
Cc: Michael Ellerman
Cc: Michael Roth
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With rw_page, page_endio is used for completing IO on a page and it
propagates write error to the address space if the IO fails. The
problem is it accesses page->mapping directly which might be okay for
file-backed pages but it shouldn't for anonymous page. Otherwise, it
can corrupt one of field from anon_vma under us and system goes panic
randomly.swap_writepage
bdev_writepage
ops->rw_pageI encountered the BUG during developing new zram feature and it was
really hard to figure it out because it made random crash, somtime
mmap_sem lockdep, sometime other places where places never related to
zram/zsmalloc, and not reproducible with some configuration.When I consider how that bug is subtle and people do fast-swap test with
brd, it's worth to add stable mark, I think.Fixes: dd6bd0d9c7db ("swap: use bdev_read_page() / bdev_write_page()")
Signed-off-by: Minchan Kim
Acked-by: Michal Hocko
Cc: Matthew Wilcox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We are using the wrong flag value in task_numa_falt function. This can
result in us doing wrong numa fault statistics update, because we update
num_pages_migrate and numa_fault_locality etc based on the flag argument
passed.Fixes: bae473a423 ("mm: introduce fault_env")
Link: http://lkml.kernel.org/r/1487498395-9544-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Acked-by: Hillf Danton
Acked-by: Kirill A. Shutemov
Cc: Rik van Riel
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Do the prot_none/FOLL_NUMA check after we are sure this is a THP pte.
Archs can implement prot_none such that it can return true for regular
pmd entries.Link: http://lkml.kernel.org/r/1487498326-8734-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Hillf Danton
Cc: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
cleanup rest of dma_addr_t and phys_addr_t type casting in mm
use %pad for dma_addr_t
use %pa for phys_addr_tLink: http://lkml.kernel.org/r/1486618489-13912-1-git-send-email-miles.chen@mediatek.com
Signed-off-by: Miles Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The class index and fullness group are not encoded in
(first)page->mapping any more, after commit 3783689a1aa8 ("zsmalloc:
introduce zspage structure"). Instead, they are store in struct zspage.Just delete this unneeded comment.
Link: http://lkml.kernel.org/r/1486620822-36826-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie
Suggested-by: Sergey Senozhatsky
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Cc: Nitin Gupta
Cc: Hanjun Guo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
arch_zone_lowest/highest_possible_pfn[] is set to 0 and [ZONE_MOVABLE]
is skipped in the loop. No need to reset them to 0 again.This patch just removes the redundant code.
Link: http://lkml.kernel.org/r/20170209141731.60208-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang
Cc: Anshuman Khandual
Cc: Mel Gorman
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We had used page->lru to link the component pages (except the first
page) of a zspage, and used INIT_LIST_HEAD(&page->lru) to init it.
Therefore, to get the last page's next page, which is NULL, we had to
use page flag PG_Private_2 to identify it.But now, we use page->freelist to link all of the pages in zspage and
init the page->freelist as NULL for last page, so no need to use
PG_Private_2 anymore.This remove redundant SetPagePrivate2 in create_page_chain and
ClearPagePrivate2 in reset_page(). Save a few cycles for migration of
zsmalloc page :)Link: http://lkml.kernel.org/r/1487076509-49270-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
At the end of a window period, if the reclaimed pages is greater than
scanned, an unsigned underflow can result in a huge pressure value and
thus a critical event. Reclaimed pages is found to go higher than
scanned because of the addition of reclaimed slab pages to reclaimed in
shrink_node without a corresponding increment to scanned pages.Minchan Kim mentioned that this can also happen in the case of a THP
page where the scanned is 1 and reclaimed could be 512.Link: http://lkml.kernel.org/r/1486641577-11685-1-git-send-email-vinmenon@codeaurora.org
Signed-off-by: Vinayak Menon
Acked-by: Minchan Kim
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Vlastimil Babka
Cc: Rik van Riel
Cc: Vladimir Davydov
Cc: Anton Vorontsov
Cc: Shiraz Hashim
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the prototypes for shmem_mapping() and shmem_zero_setup() from
linux/mm.h, since they are already provided in linux/shmem_fs.h. But
shmem_fs.h must then provide the inline stub for shmem_mapping() when
CONFIG_SHMEM is not set, and a few more cfiles now need to #include it.Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1702081658250.1549@eggly.anvils
Signed-off-by: Hugh Dickins
Cc: Johannes Weiner
Cc: Michal Simek
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When @node_reclaim_node isn't 0, the page allocator tries to reclaim
pages if the amount of free memory in the zones are below the low
watermark. On Power platform, none of NUMA nodes are scanned for page
reclaim because no nodes match the condition in zone_allows_reclaim().
On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
of Node-A to Node-A. So the preferred node even won't be scanned for
page reclaim.__alloc_pages_nodemask()
get_page_from_freelist()
zone_allows_reclaim()Anton proposed the test code as below:
# cat alloc.c
:
int main(int argc, char *argv[])
{
void *p;
unsigned long size;
unsigned long start, end;start = time(NULL);
size = strtoul(argv[1], NULL, 0);
printf("To allocate %ldGB memory\n", size);size < /proc/sys/vm/zone_reclaim_mode; \
sync; \
echo 3 > /proc/sys/vm/drop_caches; \
# taskset -c 0 cat file.32G > /dev/null; \
grep FilePages /sys/devices/system/node/node0/meminfo
Node 0 FilePages: 33619712 kB
# taskset -c 0 ./alloc 128
# grep FilePages /sys/devices/system/node/node0/meminfo
Node 0 FilePages: 33619840 kB
# grep MemFree /sys/devices/system/node/node0/meminfo
Node 0 MemFree: 186816 kBWith the patch applied, the pagecache on node-0 is reclaimed when its
free memory is running out. It's the expected behaviour.# echo 2 > /proc/sys/vm/zone_reclaim_mode; \
sync; \
echo 3 > /proc/sys/vm/drop_caches
# taskset -c 0 cat file.32G > /dev/null; \
grep FilePages /sys/devices/system/node/node0/meminfo
Node 0 FilePages: 33605568 kB
# taskset -c 0 ./alloc 128
# grep FilePages /sys/devices/system/node/node0/meminfo
Node 0 FilePages: 1379520 kB
# grep MemFree /sys/devices/system/node/node0/meminfo
Node 0 MemFree: 317120 kBFixes: 5f7a75acdb24 ("mm: page_alloc: do not cache reclaim distances")
Link: http://lkml.kernel.org/r/1486532455-29613-1-git-send-email-gwshan@linux.vnet.ibm.com
Signed-off-by: Gavin Shan
Acked-by: Mel Gorman
Acked-by: Michal Hocko
Cc: Anton Blanchard
Cc: Michael Ellerman
Cc: [3.16+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When mainline introduced commit a96dfddbcc04 ("base/memory, hotplug: fix
a kernel oops in show_valid_zones()"), it obtained the valid start and
end pfn from the given pfn range. The valid start pfn can fix the
actual issue, but it introduced another issue. The valid end pfn will
may exceed the given end_pfn.Although the incorrect overflow will not result in actual problem at
present, but I think it need to be fixed.[toshi.kani@hpe.com: remove assumption that end_pfn is aligned by MAX_ORDER_NR_PAGES]
Fixes: a96dfddbcc04 ("base/memory, hotplug: fix a kernel oops in show_valid_zones()")
Link: http://lkml.kernel.org/r/1486467299-22648-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang
Signed-off-by: Toshi Kani
Cc: Vlastimil Babka
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The idea is that without doing more calculations we extend zero pages to
same element pages for zram. zero page is special case of same element
page with zero element.1. the test is done under android 7.0
2. startup too many applications circularly
3. sample the zero pages, same pages (none-zero element)
and total pages in function page_zero_filledthe result is listed as below:
ZERO SAME TOTAL
36214 17842 598196ZERO/TOTAL SAME/TOTAL (ZERO+SAME)/TOTAL ZERO/SAME
AVERAGE 0.060631909 0.024990816 0.085622726 2.663825038
STDEV 0.00674612 0.005887625 0.009707034 2.115881328
MAX 0.069698422 0.030046087 0.094975336 7.56043956
MIN 0.03959586 0.007332205 0.056055193 1.928985507from the above data, the benefit is about 2.5% and up to 3% of total
swapout pages.The defect of the patch is that when we recovery a page from non-zero
element the operations are low efficient for partial read.This patch extends zero_page to same_page so if there is any user to
have monitored zero_pages, he will be surprised if the number is
increased but it's not harmful, I believe.[minchan@kernel.org: do not free same element pages in zram_meta_free]
Link: http://lkml.kernel.org/r/20170207065741.GA2567@bbox
Link: http://lkml.kernel.org/r/1483692145-75357-1-git-send-email-zhouxianrong@huawei.com
Link: http://lkml.kernel.org/r/1486307804-27903-1-git-send-email-minchan@kernel.org
Signed-off-by: zhouxianrong
Signed-off-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The likely/unlikely profiler noticed that the unlikely statement in
wb_domain_writeout_inc() is constantly wrong. This is due to the "not"
(!) being outside the unlikely statement. It is likely that
dom->period_time will be set, but unlikely that it wont be. Move the
not into the unlikely statement.Link: http://lkml.kernel.org/r/20170206120035.3c2e2b91@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware)
Reviewed-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With this our protnone becomes a present pte with READ/WRITE/EXEC bit
cleared. By default we also set _PAGE_PRIVILEGED on such pte. This is
now used to help us identify a protnone pte that as saved write bit.
For such pte, we will clear the _PAGE_PRIVILEGED bit. The pte still
remain non-accessible from both user and kernel.[aneesh.kumar@linux.vnet.ibm.com: v3]
Link: http://lkml.kernel.org/r/1487498625-10891-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1487050314-3892-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Acked-by: Michael Neuling
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Without this KSM will consider the page write protected, but a numa
fault can later mark the page writable. This can result in memory
corruption.Link: http://lkml.kernel.org/r/1487498625-10891-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "Numabalancing preserve write fix", v2.
This patch series address an issue w.r.t THP migration and autonuma
preserve write feature. migrate_misplaced_transhuge_page() cannot deal
with concurrent modification of the page. It does a page copy without
following the migration pte sequence. IIUC, this was done to keep the
migration simpler and at the time of implemenation we didn't had THP
page cache which would have required a more elaborate migration scheme.
That means thp autonuma migration expect the protnone with saved write
to be done such that both kernel and user cannot update the page
content. This patch series enables archs like ppc64 to do that. We are
good with the hash translation mode with the current code, because we
never create a hardware page table entry for a protnone pte.This patch (of 2):
Autonuma preserves the write permission across numa fault to avoid
taking a writefault after a numa fault (Commit: b191f9b106ea " mm: numa:
preserve PTE write permissions across a NUMA hinting fault").
Architecture can implement protnone in different ways and some may
choose to implement that by clearing Read/ Write/Exec bit of pte.
Setting the write bit on such pte can result in wrong behaviour. Fix
this up by allowing arch to override how to save the write bit on a
protnone pte.[aneesh.kumar@linux.vnet.ibm.com: don't mark pte saved write in case of dirty_accountable]
Link: http://lkml.kernel.org/r/1487942884-16517-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
[aneesh.kumar@linux.vnet.ibm.com: v3]
Link: http://lkml.kernel.org/r/1487498625-10891-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1487050314-3892-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Acked-by: Michael Neuling
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Architectures like ppc64, use privilege access bit to mark pte non
accessible. This implies that kernel can do a copy_to_user to an
address marked for numa fault. This also implies that there can be a
parallel hardware update for the pte. set_pte_at cannot be used in such
scenarios. Hence switch the pte update to use ptep_get_and_clear and
set_pte_at combination.[akpm@linux-foundation.org: remove unwanted ppc change, per Aneesh]
Link: http://lkml.kernel.org/r/1486400776-28114-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Acked-by: Rik van Riel
Acked-by: Mel Gorman
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Running my likely/unlikely profiler, I discovered that the test in
shmem_write_begin() that tests for info->seals as unlikely, is always
incorrect. This is because shmem_get_inode() sets info->seals to have
F_SEAL_SEAL set by default, and it is unlikely to be cleared when
shmem_write_begin() is called. Thus, the if statement is very likely.But as the if statement block only cares about F_SEAL_WRITE and
F_SEAL_GROW, change the test to only test those two bits.Link: http://lkml.kernel.org/r/20170203105656.7aec6237@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware)
Acked-by: Hugh Dickins
Cc: David Herrmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Hillf Danton pointed out that since commit 1d82de618dd ("mm, vmscan:
make kswapd reclaim in terms of nodes") that PGDAT_WRITEBACK is no
longer cleared.It was not noticed as triggering it requires pages under writeback to
cycle twice through the LRU and before kswapd gets stalled.
Historically, such issues tended to occur on small machines writing
heavily to slow storage such as a USB stick.Once kswapd stalls, direct reclaim stalls may be higher but due to the
fact that memory pressure is required, it would not be very noticable.Michal Hocko suggested removing the flag entirely but the conservative
fix is to restore the intended PGDAT_WRITEBACK behaviour and clear the
flag when a suitable zone is balanced.Fixes: 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes")
Link: http://lkml.kernel.org/r/20170203203222.gq7hk66yc36lpgtb@suse.de
Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Acked-by: Hillf Danton
Cc: Minchan Kim
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The fault wrappers drm_vm_fault(), drm_vm_shm_fault(),
drm_vm_dma_fault() and drm_vm_sg_fault() used to provide extra logic
beyond what was in the "drm_do_*" versions of these functions, but as of
commit ca0b07d9a969 ("drm: convert drm from nopage to fault") they are
just unnecessary wrappers that do nothing.Remove them, and rename the the drm_do_* fault handlers to remove the
"do_" since they no longer have corresponding wrappers.Link: http://lkml.kernel.org/r/1486155698-25717-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler
Reviewed-by: Sean Paul
Acked-by: Daniel Vetter
Cc: Dave Jiang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix whitespace issues, extraneous braces.
Link: http://lkml.kernel.org/r/1485992240-10986-5-git-send-email-me@tobin.cc
Signed-off-by: Tobin C Harding
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch fixes sparse warning: Using plain integer as NULL pointer.
Replaces assignment of 0 to pointer with NULL assignment.Link: http://lkml.kernel.org/r/1485992240-10986-2-git-send-email-me@tobin.cc
Signed-off-by: Tobin C Harding
Acked-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Link: http://lkml.kernel.org/r/20170202011942.1609-1-standby24x7@gmail.com
Signed-off-by: Masanari Iida
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__vmalloc_area_node() allocates pages to cover the requested vmalloc
size. This can be a lot of memory. If the current task is killed by
the OOM killer, and thus has an unlimited access to memory reserves, it
can consume all the memory theoretically. Fix this by checking for
fatal_signal_pending and back off early.Link: http://lkml.kernel.org/r/20170201092706.9966-4-mhocko@kernel.org
Signed-off-by: Michal Hocko
Reviewed-by: Christoph Hellwig
Cc: Tetsuo Handa
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are many reasons of CMA allocation failure such as EBUSY, ENOMEM,
EINTR. But we did not know error reason so far. This patch prints the
error value.Additionally if CONFIG_CMA_DEBUG is enabled, this patch shows bitmap
status to know available pages. Actually CMA internally tries on all
available regions because some regions can be failed because of EBUSY.
Bitmap status is useful to know in detail on both ENONEM and EBUSY;ENOMEM: not tried at all because of no available region
it could be too small total region or could be fragmentation issue
EBUSY: tried some region but all failedThis is an ENOMEM example with this patch.
[2: Binder:714_1: 744] cma: cma_alloc: alloc failed, req-size: 256 pages, ret: -12
If CONFIG_CMA_DEBUG is enabled, avabile pages also will be shown as
concatenated size@position format. So 4@572 means that there are 4
available pages at 572 position starting from 0 position.[2: Binder:714_1: 744] cma: number of available pages: 4@572+7@585+7@601+8@632+38@730+166@1114+127@1921=> 357 free of 2048 total pages
Link: http://lkml.kernel.org/r/1485909785-3952-1-git-send-email-jaewon31.kim@samsung.com
Signed-off-by: Jaewon Kim
Acked-by: Michal Nazarewicz
Cc: Laura Abbott
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If madvise(2) advice will result in the underlying vma being split and
the number of areas mapped by the process will exceed
/proc/sys/vm/max_map_count as a result, return ENOMEM instead of EAGAIN.EAGAIN is returned by madvise(2) when a kernel resource, such as slab,
is temporarily unavailable. It indicates that userspace should retry
the advice in the near future. This is important for advice such as
MADV_DONTNEED which is often used by malloc implementations to free
memory back to the system: we really do want to free memory back when
madvise(2) returns EAGAIN because slab allocations (for vmas, anon_vmas,
or mempolicies) cannot be allocated.Encountering /proc/sys/vm/max_map_count is not a temporary failure,
however, so return ENOMEM to indicate this is a more serious issue. A
followup patch to the man page will specify this behavior.Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701241431120.42507@chino.kir.corp.google.com
Signed-off-by: David Rientjes
Cc: Jonathan Corbet
Cc: Johannes Weiner
Cc: Jerome Marchand
Cc: "Kirill A. Shutemov"
Cc: Michael Kerrisk
Cc: Anshuman Khandual
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The callers of the DMA alloc functions already provide the proper
context GFP flags. Make sure to pass them through to the CMA allocator,
to make the CMA compaction context aware.Link: http://lkml.kernel.org/r/20170127172328.18574-3-l.stach@pengutronix.de
Signed-off-by: Lucas Stach
Acked-by: Vlastimil Babka
Acked-by: Michal Hocko
Cc: Radim Krcmar
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Chris Zankel
Cc: Ralf Baechle
Cc: Paolo Bonzini
Cc: Alexander Graf
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Most users of this interface just want to use it with the default
GFP_KERNEL flags, but for cases where DMA memory is allocated it may be
called from a different context.No functional change yet, just passing through the flag to the
underlying alloc_contig_range function.Link: http://lkml.kernel.org/r/20170127172328.18574-2-l.stach@pengutronix.de
Signed-off-by: Lucas Stach
Acked-by: Vlastimil Babka
Acked-by: Michal Hocko
Cc: Radim Krcmar
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Chris Zankel
Cc: Ralf Baechle
Cc: Paolo Bonzini
Cc: Alexander Graf
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds