Eric Lee / smarc-fsl-linux-kernel

09 Sep, 2017

40 commits

f808c13fd lib/interval_tree: fast overlap detection ... Browse Code »

Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().

As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available. While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().

[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Signed-off-by: Jérôme Glisse
Acked-by: Christian König
Acked-by: Peter Zijlstra (Intel)
Acked-by: Doug Ledford
Acked-by: Michael S. Tsirkin
Cc: David Airlie
Cc: Jason Wang
Cc: Christian Benvenuti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:49 +0800
09663c86e block/cfq: replace cfq_rb_root leftmost caching ... Browse Code »

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/20170719014603.19029-11-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Reviewed-by: Jan Kara
Acked-by: Peter Zijlstra (Intel)
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:49 +0800
a23ba907d locking/rtmutex: replace top-waiter and pi_waiters leftmost caching ... Browse Code »

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/20170719014603.19029-10-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:49 +0800
2161573ec sched/deadline: replace earliest dl and rq leftmost caching ... Browse Code »

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/20170719014603.19029-9-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:49 +0800
bfb068892 sched/fair: replace cfs_rq->rb_leftmost ... Browse Code »

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/20170719014603.19029-8-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:48 +0800
b10d43f98 lib/rbtree_test.c: support rb_root_cached ... Browse Code »

We can work with a single rb_root_cached root to test both cached and
non-cached rbtrees. In addition, also add a test to measure latencies
between rb_first and its fast counterpart.

Link: http://lkml.kernel.org/r/20170719014603.19029-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:48 +0800
977bd8d5e lib/rbtree_test.c: add (inorder) traversal test ... Browse Code »

This adds a second test for regular rb-tree testing in that there is no
need to repeat it for the augmented flavor.

Link: http://lkml.kernel.org/r/20170719014603.19029-6-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:48 +0800
223f8911e lib/rbtree_test.c: make input module parameters ... Browse Code »

Allows for more flexible debugging.

Link: http://lkml.kernel.org/r/20170719014603.19029-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:48 +0800
35dc67d7d rbtree: add some additional comments for rebalancing cases ... Browse Code »

While overall the code is very nicely commented, it might not be
immediately obvious from the diagrams what is going on. Add a very
brief summary of each case. Opposite cases where the node is the left
child are left untouched.

Link: http://lkml.kernel.org/r/20170719014603.19029-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:48 +0800
2aadf7fc7 rbtree: optimize root-check during rebalancing loop ... Browse Code »

The only times the nil-parent (root node) condition is true is when the
node is the first in the tree, or after fixing rbtree rule #4 and the
case 1 rebalancing made the node the root. Such conditions do not apply
most of the time:

(i) The common case in an rbtree is to have more than a single node,
so this is only true for the first rb_insert().

(ii) While there is a chance only one first rotation is needed, cases
where the node's uncle is black (cases 2,3) are more common as we can
have the following scenarios during the rotation looping:

case1 only, case1+1, case2+3, case1+2+3, case3 only, etc.

This patch, therefore, adds an unlikely() optimization to this
conditional. When profiling with CONFIG_PROFILE_ANNOTATED_BRANCHES, a
kernel build shows that the incorrect rate is less than 15%, and for
workloads that involve insert mostly trees overtime tend to have less
than 2% incorrect rate.

Link: http://lkml.kernel.org/r/20170719014603.19029-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:48 +0800
cd9e61ed1 rbtree: cache leftmost node internally ... Browse Code »

Patch series "rbtree: Cache leftmost node internally", v4.

A series to extending rbtrees to internally cache the leftmost node such
that we can have fast overlap check optimization for all interval tree
users[1]. The benefits of this series are that:

(i) Unify users that do internal leftmost node caching.
(ii) Optimize all interval tree users.
(iii) Convert at least two new users (epoll and procfs) to the new interface.

This patch (of 16):

Red-black tree semantics imply that nodes with smaller or greater (or
equal for duplicates) keys always be to the left and right,
respectively. For the kernel this is extremely evident when considering
our rb_first() semantics. Enabling lookups for the smallest node in the
tree in O(1) can save a good chunk of cycles in not having to walk down
the tree each time. To this end there are a few core users that
explicitly do this, such as the scheduler and rtmutexes. There is also
the desire for interval trees to have this optimization allowing faster
overlap checking.

This patch introduces a new 'struct rb_root_cached' which is just the
root with a cached pointer to the leftmost node. The reason why the
regular rb_root was not extended instead of adding a new structure was
that this allows the user to have the choice between memory footprint
and actual tree performance. The new wrappers on top of the regular
rb_root calls are:

- rb_first_cached(cached_root) -- which is a fast replacement
for rb_first.

- rb_insert_color_cached(node, cached_root, new)

- rb_erase_cached(node, cached_root)

In addition, augmented cached interfaces are also added for basic
insertion and deletion operations; which becomes important for the
interval tree changes.

With the exception of the inserts, which adds a bool for updating the
new leftmost, the interfaces are kept the same. To this end, porting rb
users to the cached version becomes really trivial, and keeping current
rbtree semantics for users that don't care about the optimization
requires zero overhead.

Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Reviewed-by: Jan Kara
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:48 +0800
c32ee3d9a bitops: avoid integer overflow in GENMASK(_ULL) ... Browse Code »

GENMASK(_ULL) performs a left-shift of ~0UL(L), which technically
results in an integer overflow. clang raises a warning if the overflow
occurs in a preprocessor expression. Clear the low-order bits through a
substraction instead of the left-shift to avoid the overflow.

(akpm: no change in .text size in my testing)

Link: http://lkml.kernel.org/r/20170803212020.24939-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthias Kaehlcke
2017-09-09 09:26:48 +0800
e9ef073a0 include: warn for inconsistent endian config definition ... Browse Code »

We have seen some generic code use config parameter CONFIG_CPU_BIG_ENDIAN
to decide the endianness.

Here are the few examples.
include/asm-generic/qrwlock.h
drivers/of/base.c
drivers/of/fdt.c
drivers/tty/serial/earlycon.c
drivers/tty/serial/serial_core.c

Display warning if CPU_BIG_ENDIAN is not defined on big endian
architecture and also warn if it defined on little endian architectures.

Here is our original discussion
https://lkml.org/lkml/2017/5/24/620

Link: http://lkml.kernel.org/r/1499358861-179979-4-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger
Suggested-by: Arnd Bergmann
Acked-by: Geert Uytterhoeven
Cc: "James E.J. Bottomley"
Cc: Alexander Viro
Cc: David S. Miller
Cc: Greg KH
Cc: Helge Deller
Cc: Ingo Molnar
Cc: Jonas Bonn
Cc: Max Filippov
Cc: Michael Ellerman (powerpc)
Cc: Michal Simek
Cc: Peter Zijlstra
Cc: Stafford Horne
Cc: Stefan Kristiansson
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Babu Moger
2017-09-09 09:26:48 +0800
206d3642d arch/microblaze: add choice for endianness and update Makefile ... Browse Code »

microblaze architectures can be configured for either little or big endian
formats. Add a choice option for the user to select the correct endian
format(default to big endian).

Also update the Makefile so toolchain can compile for the format it is
configured for.

Link: http://lkml.kernel.org/r/1499358861-179979-3-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger
Signed-off-by: Arnd Bergmann
Cc: Michal Simek
Cc: "James E.J. Bottomley"
Cc: Alexander Viro
Cc: David S. Miller
Cc: Geert Uytterhoeven
Cc: Greg KH
Cc: Helge Deller
Cc: Ingo Molnar
Cc: Jonas Bonn
Cc: Max Filippov
Cc: Michael Ellerman (powerpc)
Cc: Peter Zijlstra
Cc: Stafford Horne
Cc: Stefan Kristiansson
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Babu Moger
2017-09-09 09:26:48 +0800
4c97a0c8f arch: define CPU_BIG_ENDIAN for all fixed big endian archs ... Browse Code »

Patch series "Define CPU_BIG_ENDIAN or warn for inconsistencies", v3.

While working on enabling queued rwlock on SPARC, found this following
code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN to
clear a byte.

static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
{
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
}

Problem is many of the fixed big endian architectures don't define
CPU_BIG_ENDIAN and clears the wrong byte.

Define CPU_BIG_ENDIAN for all the fixed big endian architecture to fix it.

Also found few more references of this config parameter in
drivers/of/base.c
drivers/of/fdt.c
drivers/tty/serial/earlycon.c
drivers/tty/serial/serial_core.c
Be aware that this may cause regressions if someone has worked-around
problems in the above code already. Remove the work-around.

Here is our original discussion
https://lkml.org/lkml/2017/5/24/620

Link: http://lkml.kernel.org/r/1499358861-179979-2-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger
Suggested-by: Arnd Bergmann
Acked-by: Geert Uytterhoeven
Acked-by: David S. Miller
Acked-by: Stafford Horne
Cc: Yoshinori Sato
Cc: Jonas Bonn
Cc: Stefan Kristiansson
Cc: "James E.J. Bottomley"
Cc: Helge Deller
Cc: Alexander Viro
Cc: Michal Simek
Cc: Michael Ellerman (powerpc)
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Max Filippov
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Babu Moger
2017-09-09 09:26:48 +0800
9b130ad5b treewide: make "nr_cpu_ids" unsigned ... Browse Code »

First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
function old new delta
coretemp_cpu_online 450 512 +62
rcu_init_one 1234 1272 +38
pci_device_probe 374 399 +25

...

pgdat_reclaimable_pages 628 556 -72
select_fallback_rq 446 369 -77
task_numa_find_cpu 1923 1807 -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2017-09-09 09:26:48 +0800
ac036f957 vga: optimise console scrolling ... Browse Code »

Where possible, call memset16(), memmove() or memcpy() instead of using
open-coded loops. I don't like the calling convention that uses a byte
count instead of a count of u16s, but it's a little late to change that.
Reduces code size of fbcon.o by almost 400 bytes on my laptop build.

[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/20170720184539.31609-9-willy@infradead.org
Signed-off-by: Matthew Wilcox
Cc: Ralf Baechle
Cc: David Miller
Cc: Sam Ravnborg
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Minchan Kim
Cc: Richard Henderson
Cc: Russell King
Cc: Sergey Senozhatsky
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-09-09 09:26:48 +0800
1caffba9d drivers/scsi/sym53c8xx_2/sym_hipd.c: convert to use memset32 ... Browse Code »

memset32() can be used to initialise these three arrays. Minor code
footprint reduction.

Link: http://lkml.kernel.org/r/20170720184539.31609-8-willy@infradead.org
Signed-off-by: Matthew Wilcox
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: "H. Peter Anvin"
Cc: David Miller
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Minchan Kim
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Russell King
Cc: Sam Ravnborg
Cc: Sergey Senozhatsky
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-09-09 09:26:48 +0800
48ad1abef drivers/block/zram/zram_drv.c: convert to using memset_l ... Browse Code »

zram was the motivation for creating memset_l(). Minchan Kim sees a 7%
performance improvement on x86 with 100MB of non-zero deduplicatable
data:

perf stat -r 10 dd if=/dev/zram0 of=/dev/null

vanilla: 0.232050465 seconds time elapsed ( +- 0.51% )
memset_l: 0.217219387 seconds time elapsed ( +- 0.07% )

Link: http://lkml.kernel.org/r/20170720184539.31609-7-willy@infradead.org
Signed-off-by: Matthew Wilcox
Tested-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: David Miller
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Russell King
Cc: Sam Ravnborg
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-09-09 09:26:48 +0800
92ce4c3ea alpha: add support for memset16 ... Browse Code »

Alpha already had an optimised fill-memory-with-16-bit-quantity
assembler routine called memsetw(). It has a slightly different calling
convention from memset16() in that it takes a byte count, not a count of
words. That's the same convention used by ARM's __memset routines, so
rename Alpha's routine to match and add a memset16() wrapper around it.
Then convert Alpha's scr_memsetw() to call memset16() instead of
memsetw().

Link: http://lkml.kernel.org/r/20170720184539.31609-6-willy@infradead.org
Signed-off-by: Matthew Wilcox
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: David Miller
Cc: Ingo Molnar
Cc: Michael Ellerman
Cc: Minchan Kim
Cc: Ralf Baechle
Cc: Russell King
Cc: Sam Ravnborg
Cc: Sergey Senozhatsky
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-09-09 09:26:48 +0800
fd1d36260 ARM: implement memset32 & memset64 ... Browse Code »

Reuse the existing optimised memset implementation to implement an
optimised memset32 and memset64.

Link: http://lkml.kernel.org/r/20170720184539.31609-5-willy@infradead.org
Signed-off-by: Matthew Wilcox
Reviewed-by: Russell King
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: David Miller
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Minchan Kim
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Sam Ravnborg
Cc: Sergey Senozhatsky
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-09-09 09:26:48 +0800
4c5124853 x86: implement memset16, memset32 & memset64 ... Browse Code »

These are single instructions on x86. There's no 64-bit instruction for
x86-32, but we don't yet have any user for memset64() on 32-bit
architectures, so don't bother to implement it.

Link: http://lkml.kernel.org/r/20170720184539.31609-4-willy@infradead.org
Signed-off-by: Matthew Wilcox
Cc: Minchan Kim
Cc: Michael Ellerman
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: David Miller
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Russell King
Cc: Sam Ravnborg
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-09-09 09:26:48 +0800
03270c13c lib/string.c: add testcases for memset16/32/64 ... Browse Code »

[akpm@linux-foundation.org: minor tweaks]
Link: http://lkml.kernel.org/r/20170720184539.31609-3-willy@infradead.org
Signed-off-by: Matthew Wilcox
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: David Miller
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Minchan Kim
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Russell King
Cc: Sam Ravnborg
Cc: Sergey Senozhatsky
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-09-09 09:26:48 +0800
3b3c4babd lib/string.c: add multibyte memset functions ... Browse Code »

Patch series "Multibyte memset variations", v4.

A relatively common idiom we're missing is a function to fill an area of
memory with a pattern which is larger than a single byte. I first
noticed this with a zram patch which wanted to fill a page with an
'unsigned long' value. There turn out to be quite a few places in the
kernel which can benefit from using an optimised function rather than a
loop; sometimes text size, sometimes speed, and sometimes both. The
optimised PowerPC version (not included here) improves performance by
about 30% on POWER8 on just the raw memset_l().

Most of the extra lines of code come from the three testcases I added.

This patch (of 8):

memset16(), memset32() and memset64() are like memset(), but allow the
caller to fill the destination with a value larger than a single byte.
memset_l() and memset_p() allow the caller to use unsigned long and
pointer values respectively.

Link: http://lkml.kernel.org/r/20170720184539.31609-2-willy@infradead.org
Signed-off-by: Matthew Wilcox
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: David Miller
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Minchan Kim
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Russell King
Cc: Sam Ravnborg
Cc: Sergey Senozhatsky
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-09-09 09:26:48 +0800
604df3223 linux/kernel.h: move DIV_ROUND_DOWN_ULL() macro ... Browse Code »

This macro is useful to avoid link error on 32-bit systems.

We have the same definition in two drivers, so move it to
include/linux/kernel.h

While we are here, refactor DIV_ROUND_UP_ULL() by using
DIV_ROUND_DOWN_ULL().

Link: http://lkml.kernel.org/r/1500945156-12907-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Acked-by: Mark Brown
Cc: Cyrille Pitchen
Cc: Jaroslav Kysela
Cc: Takashi Iwai
Cc: Liam Girdwood
Cc: Boris Brezillon
Cc: Marek Vasut
Cc: Brian Norris
Cc: Richard Weinberger
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2017-09-09 09:26:47 +0800
140383029 fs, proc: unconditional cond_resched when reading smaps ... Browse Code »

If there are large numbers of hugepages to iterate while reading
/proc/pid/smaps, the page walk never does cond_resched(). On archs
without split pmd locks, there can be significant and observable
contention on mm->page_table_lock which cause lengthy delays without
rescheduling.

Always reschedule in smaps_pte_range() if necessary since the pagewalk
iteration can be expensive.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708211405520.131071@chino.kir.corp.google.com
Signed-off-by: David Rientjes
Cc: Minchan Kim
Cc: Hugh Dickins
Cc: "Kirill A. Shutemov"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2017-09-09 09:26:47 +0800
855d97657 proc: uninline proc_create() ... Browse Code »

Save some code from ~320 invocations all clearing last argument.

add/remove: 3/0 grow/shrink: 0/158 up/down: 45/-702 (-657)
function old new delta
proc_create - 17 +17
__ksymtab_proc_create - 16 +16
__kstrtab_proc_create - 12 +12
yam_init_driver 301 298 -3

...

cifs_proc_init 249 228 -21
via_fb_pci_probe 2304 2280 -24

Link: http://lkml.kernel.org/r/20170819094702.GA27864@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2017-09-09 09:26:47 +0800
1240ea0dc fs, proc: remove priv argument from is_stack ... Browse Code »

Commit b18cb64ead40 ("fs/proc: Stop trying to report thread stacks")
removed the priv parameter user in is_stack so the argument is
redundant. Drop it.

[arnd@arndb.de: remove unused variable]
Link: http://lkml.kernel.org/r/20170801120150.1520051-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170728075833.7241-1-mhocko@kernel.org
Signed-off-by: Michal Hocko
Signed-off-by: Arnd Bergmann
Cc: Andy Lutomirski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-09-09 09:26:47 +0800
149728e91 mm/mempolicy.c: remove BUG_ON() checks for VMA inside mpol_misplaced() ... Browse Code »

VMA and its address bounds checks are too late in this function. They
must have been verified earlier in the page fault sequence. Hence just
remove them.

Link: http://lkml.kernel.org/r/20170901130137.7617-1-khandual@linux.vnet.ibm.com
Signed-off-by: Anshuman Khandual
Suggested-by: Vlastimil Babka
Acked-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anshuman Khandual
2017-09-09 09:26:47 +0800
b6b1fd2a6 mm/swapfile.c: fix swapon frontswap_map memory leak on error ... Browse Code »

Free frontswap_map if an error is encountered before enable_swap_info().

Signed-off-by: David Rientjes
Reviewed-by: "Huang, Ying"
Cc: Darrick J. Wong
Cc: Hugh Dickins
Cc: [4.12+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2017-09-09 09:26:47 +0800
8606a1a94 mm: kvfree the swap cluster info if the swap file is unsatisfactory ... Browse Code »

If initializing a small swap file fails because the swap file has a
problem (holes, etc.) then we need to free the cluster info as part of
cleanup. Unfortunately a previous patch changed the code to use kvzalloc
but did not change all the vfree calls to use kvfree.

Found by running generic/357 from xfstests.

Link: http://lkml.kernel.org/r/20170831233515.GR3775@magnolia
Fixes: 54f180d3c181 ("mm, swap: use kvzalloc to allocate some swap data structures")
Signed-off-by: Darrick J. Wong
Reviewed-by: "Huang, Ying"
Acked-by: David Rientjes
Cc: Hugh Dickins
Cc: [4.12+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Darrick J. Wong
2017-09-09 09:26:47 +0800
f19360f01 mm/page_alloc.c: apply gfp_allowed_mask before the first allocation attempt ... Browse Code »

We are by error initializing alloc_flags before gfp_allowed_mask is
applied. This could cause problems after pm_restrict_gfp_mask() is called
during suspend operation. Apply gfp_allowed_mask before initializing
alloc_flags so that the first allocation attempt uses correct flags.

Link: http://lkml.kernel.org/r/201709020016.ADJ21342.OFLJHOOSMFVtFQ@I-love.SAKURA.ne.jp
Fixes: 83d4ca8148fd9092 ("mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath")
Signed-off-by: Tetsuo Handa
Acked-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Mel Gorman
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2017-09-09 09:26:47 +0800
66559a691 tools/testing/selftests/kcmp/kcmp_test.c: add KCMP_EPOLL_TFD testing ... Browse Code »

KCMP's KCMP_EPOLL_TFD mode merged in commit 0791e3644e5ef2 ("kcmp: add
KCMP_EPOLL_TFD mode to compare epoll target files") we've had no selftest
for it yet (except in criu development list). Thus add it.

Link: http://lkml.kernel.org/r/20170901151620.GK1898@uranus.lan
Signed-off-by: Cyrill Gorcunov
Cc: Andrey Vagin
Cc: Pavel Emelyanov
Cc: Michael Kerrisk
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2017-09-09 09:26:47 +0800
b4ccec41a mm/sparse.c: fix typo in online_mem_sections ... Browse Code »

online_mem_sections() accidentally marks online only the first section
in the given range. This is a typo which hasn't been noticed because I
haven't tested large 2GB blocks previously. All users of
pfn_to_online_page would get confused on the the rest of the pfn range
in the block.

All we need to fix this is to use iterator (pfn) rather than start_pfn.

Link: http://lkml.kernel.org/r/20170904112210.3401-1-mhocko@kernel.org
Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Anshuman Khandual
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-09-09 09:26:47 +0800
de0c799bb mm/memory.c: fix mem_cgroup_oom_disable() call missing ... Browse Code »

Seen while reading the code, in handle_mm_fault(), in the case
arch_vma_access_permitted() is failing the call to
mem_cgroup_oom_disable() is not made.

To fix that, move the call to mem_cgroup_oom_enable() after calling
arch_vma_access_permitted() as it should not have entered the memcg OOM.

Link: http://lkml.kernel.org/r/1504625439-31313-1-git-send-email-ldufour@linux.vnet.ibm.com
Fixes: bae473a423f6 ("mm: introduce fault_env")
Signed-off-by: Laurent Dufour
Acked-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Laurent Dufour
2017-09-09 09:26:47 +0800
475d0487a mm: memcontrol: use per-cpu stocks for socket memory uncharging ... Browse Code »

We've noticed a quite noticeable performance overhead on some hosts with
significant network traffic when socket memory accounting is enabled.

Perf top shows that socket memory uncharging path is hot:
2.13% [kernel] [k] page_counter_cancel
1.14% [kernel] [k] __sk_mem_reduce_allocated
1.14% [kernel] [k] _raw_spin_lock
0.87% [kernel] [k] _raw_spin_lock_irqsave
0.84% [kernel] [k] tcp_ack
0.84% [kernel] [k] ixgbe_poll
0.83% < workload >
0.82% [kernel] [k] enqueue_entity
0.68% [kernel] [k] __fget
0.68% [kernel] [k] tcp_delack_timer_handler
0.67% [kernel] [k] __schedule
0.60% < workload >
0.59% [kernel] [k] __inet6_lookup_established
0.55% [kernel] [k] __switch_to
0.55% [kernel] [k] menu_select
0.54% libc-2.20.so [.] __memcpy_avx_unaligned

To address this issue, the existing per-cpu stock infrastructure can be
used.

refill_stock() can be called from mem_cgroup_uncharge_skmem() to move
charge to a per-cpu stock instead of calling atomic
page_counter_uncharge().

To prevent the uncontrolled growth of per-cpu stocks, refill_stock()
will explicitly drain the cached charge, if the cached value exceeds
CHARGE_BATCH.

This allows significantly optimize the load:
1.21% [kernel] [k] _raw_spin_lock
1.01% [kernel] [k] ixgbe_poll
0.92% [kernel] [k] _raw_spin_lock_irqsave
0.90% [kernel] [k] enqueue_entity
0.86% [kernel] [k] tcp_ack
0.85% < workload >
0.74% perf-11120.map [.] 0x000000000061bf24
0.73% [kernel] [k] __schedule
0.67% [kernel] [k] __fget
0.63% [kernel] [k] __inet6_lookup_established
0.62% [kernel] [k] menu_select
0.59% < workload >
0.59% [kernel] [k] __switch_to
0.57% libc-2.20.so [.] __memcpy_avx_unaligned

Link: http://lkml.kernel.org/r/20170829100150.4580-1-guro@fb.com
Signed-off-by: Roman Gushchin
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Gushchin
2017-09-09 09:26:47 +0800
3a77d2148 mm: fadvise: avoid fadvise for fs without backing device ... Browse Code »

The fadvise() manpage is silent on fadvise()'s effect on memory-based
filesystems (shmem, hugetlbfs & ramfs) and pseudo file systems (procfs,
sysfs, kernfs). The current implementaion of fadvise is mostly a noop
for such filesystems except for FADV_DONTNEED which will trigger
expensive remote LRU cache draining. This patch makes the noop of
fadvise() on such file systems very explicit.

However this change has two side effects for ramfs and one for tmpfs.
First fadvise(FADV_DONTNEED) could remove the unmapped clean zero'ed
pages of ramfs (allocated through read, readahead & read fault) and
tmpfs (allocated through read fault). Also fadvise(FADV_WILLNEED) could
create such clean zero'ed pages for ramfs. This change removes those
possibilities.

One of our generic libraries does fadvise(FADV_DONTNEED). Recently we
observed high latency in fadvise() and noticed that the users have
started using tmpfs files and the latency was due to expensive remote
LRU cache draining. For normal tmpfs files (have data written on them),
fadvise(FADV_DONTNEED) will always trigger the unneeded remote cache
draining.

Link: http://lkml.kernel.org/r/20170818011023.181465-1-shakeelb@google.com
Signed-off-by: Shakeel Butt
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Hillf Danton
Cc: Vlastimil Babka
Cc: Hugh Dickins
Cc: Greg Thelen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shakeel Butt
2017-09-09 09:26:47 +0800
3eb95feac mm/zsmalloc.c: change stat type parameter to int ... Browse Code »

zs_stat_inc/dec/get() uses enum zs_stat_type for the stat type, however
some callers pass an enum fullness_group value. Change the type to int to
reflect the actual use of the functions and get rid of 'enum-conversion'
warnings

Link: http://lkml.kernel.org/r/20170731175000.56538-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Cc: Doug Anderson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthias Kaehlcke
2017-09-09 09:26:47 +0800
9472f23c9 mm/mlock.c: use page_zone() instead of page_zone_id() ... Browse Code »

page_zone_id() is a specialized function to compare the zone for the pages
that are within the section range. If the section of the pages are
different, page_zone_id() can be different even if their zone is the same.
This wrong usage doesn't cause any actual problem since
__munlock_pagevec_fill() would be called again with failed index.
However, it's better to use more appropriate function here.

Link: http://lkml.kernel.org/r/1503559211-10259-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2017-09-09 09:26:47 +0800
638032224 mm: consider the number in local CPUs when reading NUMA stats ... Browse Code »

To avoid deviation, the per cpu number of NUMA stats in
vm_numa_stat_diff[] is included when a user *reads* the NUMA stats.

Since NUMA stats does not be read by users frequently, and kernel does not
need it to make a decision, it will not be a problem to make the readers
more expensive.

Link: http://lkml.kernel.org/r/1503568801-21305-4-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang
Reported-by: Jesper Dangaard Brouer
Acked-by: Mel Gorman
Cc: Aaron Lu
Cc: Andi Kleen
Cc: Christopher Lameter
Cc: Dave Hansen
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Tim Chen
Cc: Ying Huang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kemi Wang
2017-09-09 09:26:47 +0800