Eric Lee / smarc-fsl-linux-kernel

15 May, 2019

40 commits

205b20cc5 mm: memcontrol: make cgroup stats and events query API explicitly local ... Browse Code »

Patch series "mm: memcontrol: memory.stat cost & correctness".

The cgroup memory.stat file holds recursive statistics for the entire
subtree. The current implementation does this tree walk on-demand
whenever the file is read. This is giving us problems in production.

1. The cost of aggregating the statistics on-demand is high. A lot of
system service cgroups are mostly idle and their stats don't change
between reads, yet we always have to check them. There are also always
some lazily-dying cgroups sitting around that are pinned by a handful
of remaining page cache; the same applies to them.

In an application that periodically monitors memory.stat in our
fleet, we have seen the aggregation consume up to 5% CPU time.

2. When cgroups die and disappear from the cgroup tree, so do their
accumulated vm events. The result is that the event counters at
higher-level cgroups can go backwards and confuse some of our
automation, let alone people looking at the graphs over time.

To address both issues, this patch series changes the stat
implementation to spill counts upwards when the counters change.

The upward spilling is batched using the existing per-cpu cache. In a
sparse file stress test with 5 level cgroup nesting, the additional cost
of the flushing was negligible (a little under 1% of CPU at 100% CPU
utilization, compared to the 5% of reading memory.stat during regular
operation).

This patch (of 4):

memcg_page_state(), lruvec_page_state(), memcg_sum_events() are
currently returning the state of the local memcg or lruvec, not the
recursive state.

In practice there is a demand for both versions, although the callers
that want the recursive counts currently sum them up by hand.

Per default, cgroups are considered recursive entities and generally we
expect more users of the recursive counters, with the local counts being
special cases. To reflect that in the name, add a _local suffix to the
current implementations.

The following patch will re-incarnate these functions with recursive
semantics, but with an O(1) implementation.

[hannes@cmpxchg.org: fix bisection hole]
Link: http://lkml.kernel.org/r/20190417160347.GC23013@cmpxchg.org
Link: http://lkml.kernel.org/r/20190412151507.2769-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Reviewed-by: Shakeel Butt
Reviewed-by: Roman Gushchin
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2019-05-15 10:52:53 +0800
6a0243306 drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl ... Browse Code »

The "param.count" value is a u64 thatcomes from the user. The code
later in the function assumes that param.count is at least one and if
it's not then it leads to an Oops when we dereference the ZERO_SIZE_PTR.

Also the addition can have an integer overflow which would lead us to
allocate a smaller "pages" array than required. I can't immediately
tell what the possible run times implications are, but it's safest to
prevent the overflow.

Link: http://lkml.kernel.org/r/20181218082129.GE32567@kadam
Fixes: 6db7199407ca ("drivers/virt: introduce Freescale hypervisor management driver")
Signed-off-by: Dan Carpenter
Reviewed-by: Andrew Morton
Cc: Timur Tabi
Cc: Mihai Caraman
Cc: Kumar Gala
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2019-05-15 10:52:52 +0800
c8ea3663f drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl ... Browse Code »

strndup_user() returns error pointers on error, and then in the error
handling we pass the error pointers to kfree(). It will cause an Oops.

Link: http://lkml.kernel.org/r/20181218082003.GD32567@kadam
Fixes: 6db7199407ca ("drivers/virt: introduce Freescale hypervisor management driver")
Signed-off-by: Dan Carpenter
Reviewed-by: Andrew Morton
Cc: Timur Tabi
Cc: Mihai Caraman
Cc: Kumar Gala
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2019-05-15 10:52:52 +0800
871789d4a mm, memcg: rename ambiguously named memory.stat counters and functions ... Browse Code »

I spent literally an hour trying to work out why an earlier version of
my memory.events aggregation code doesn't work properly, only to find
out I was calling memcg->events instead of memcg->memory_events, which
is fairly confusing.

This naming seems in need of reworking, so make it harder to do the
wrong thing by using vmevents instead of events, which makes it more
clear that these are vm counters rather than memcg-specific counters.

There are also a few other inconsistent names in both the percpu and
aggregated structs, so these are all cleaned up to be more coherent and
easy to understand.

This commit contains code cleanup only: there are no logic changes.

[akpm@linux-foundation.org: fix it for preceding changes]
Link: http://lkml.kernel.org/r/20190208224319.GA23801@chrisdown.name
Signed-off-by: Chris Down
Acked-by: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Cc: Roman Gushchin
Cc: Dennis Zhou
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Down
2019-05-15 10:52:52 +0800
b09e89366 arch: remove <asm/sizes.h> and <asm-generic/sizes.h> ... Browse Code »

Now that all instances of #include have been replaced with
#include , we can remove these.

Link: http://lkml.kernel.org/r/1553267665-27228-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2019-05-15 10:52:52 +0800
87dfb311b treewide: replace #include <asm/sizes.h> with #include <linux/sizes.h> ... Browse Code »

Since commit dccd2304cc90 ("ARM: 7430/1: sizes.h: move from asm-generic
to "), and are just
wrappers of .

This commit replaces all and to
prepare for the removal.

Link: http://lkml.kernel.org/r/1553267665-27228-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2019-05-15 10:52:52 +0800
3813393f5 fs/block_dev.c: Remove duplicate header ... Browse Code »

linux/dax.h is included more than once.

Link: http://lkml.kernel.org/r/5c867e95.1c69fb81.4f15a.e5e4@mx.google.com
Signed-off-by: Sabyasachi Gupta
Acked-by: Souptick Joarder
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sabyasachi Gupta
2019-05-15 10:52:52 +0800
081d7d35f fs/cachefiles/namei.c: remove duplicate header ... Browse Code »

linux/xattr.h is included more than once.

Link: http://lkml.kernel.org/r/5c86803d.1c69fb81.1a7c6.2b78@mx.google.com
Signed-off-by: Sabyasachi Gupta
Acked-by: Souptick Joarder
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sabyasachi Gupta
2019-05-15 10:52:52 +0800
9e9291c71 include/linux/sched/signal.h: replace `tsk' with `task' ... Browse Code »

This file uses "task" 85 times and "tsk" 25 times. It is better to be
consistent.

Link: http://lkml.kernel.org/r/20181129180547.15976-1-avagin@gmail.com
Signed-off-by: Andrei Vagin
Reviewed-by: Andrew Morton
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrei Vagin
2019-05-15 10:52:52 +0800
10bcba8c1 fs/coda/psdev.c: remove duplicate header ... Browse Code »

linux/poll.h is included more than once.

Link: http://lkml.kernel.org/r/5c86820f.1c69fb81.149f0.0834@mx.google.com
Signed-off-by: Sabyasachi Gupta
Acked-by: Souptick Joarder
Cc: Jan Harkes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sabyasachi Gupta
2019-05-15 10:52:52 +0800
99db46ea2 ipc: do cyclic id allocation for the ipc object. ... Browse Code »

For ipcmni_extend mode, the sequence number space is only 7 bits. So
the chance of id reuse is relatively high compared with the non-extended
mode.

To alleviate this id reuse problem, this patch enables cyclic allocation
for the index to the radix tree (idx). The disadvantage is that this
can cause a slight slow-down of the fast path, as the radix tree could
be higher than necessary.

To limit the radix tree height, I have chosen the following limits:
1) The cycling is done over in_use*1.5.
2) At least, the cycling is done over
"normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
"ipcmni_extended": 4096 elements

Result:
- for normal mode:
No change for 4095 active objects until the 3rd level
is added without cyclic allocation.

For a 2-level radix tree compared to a 1-level radix tree, I have
observed < 1% performance impact.

Notes:
1) Normal "x=semget();y=semget();" is unaffected: Then the idx
is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic()
is used.

2) The -1% happens in a microbenchmark after this situation:
x=semget();
for(i=0;i<
Acked-by: Waiman Long
Cc: "Luis R. Rodriguez"
Cc: Kees Cook
Cc: Jonathan Corbet
Cc: Al Viro
Cc: Matthew Wilcox
Cc: "Eric W . Biederman"
Cc: Takashi Iwai
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2019-05-15 10:52:52 +0800
3278a2c20 ipc: conserve sequence numbers in ipcmni_extend mode ... Browse Code »

Rewrite, based on the patch from Waiman Long:

The mixing in of a sequence number into the IPC IDs is probably to avoid
ID reuse in userspace as much as possible. With ipcmni_extend mode, the
number of usable sequence numbers is greatly reduced leading to higher
chance of ID reuse.

To address this issue, we need to conserve the sequence number space as
much as possible. Right now, the sequence number is incremented for
every new ID created. In reality, we only need to increment the
sequence number when new allocated ID is not greater than the last one
allocated. It is in such case that the new ID may collide with an
existing one. This is being done irrespective of the ipcmni mode.

In order to avoid any races, the index is first allocated and then the
pointer is replaced.

Changes compared to the initial patch:
- Handle failures from idr_alloc().
- Avoid that concurrent operations can see the wrong sequence number.
(This is achieved by using idr_replace()).
- IPCMNI_SEQ_SHIFT is not a constant, thus renamed to
ipcmni_seq_shift().
- IPCMNI_SEQ_MAX is not a constant, thus renamed to ipcmni_seq_max().

Link: http://lkml.kernel.org/r/20190329204930.21620-2-longman@redhat.com
Signed-off-by: Manfred Spraul
Signed-off-by: Waiman Long
Suggested-by: Matthew Wilcox
Acked-by: Waiman Long
Cc: Al Viro
Cc: Davidlohr Bueso
Cc: "Eric W . Biederman"
Cc: Jonathan Corbet
Cc: Kees Cook
Cc: "Luis R. Rodriguez"
Cc: Takashi Iwai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2019-05-15 10:52:52 +0800
5ac893b8c ipc: allow boot time extension of IPCMNI from 32k to 16M ... Browse Code »

The maximum number of unique System V IPC identifiers was limited to
32k. That limit should be big enough for most use cases.

However, there are some users out there requesting for more, especially
those that are migrating from Solaris which uses 24 bits for unique
identifiers. To satisfy the need of those users, a new boot time kernel
option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This
is a 512X increase which should be big enough for users out there that
need a large number of unique IPC identifier.

The use of this new option will change the pattern of the IPC
identifiers returned by functions like shmget(2). An application that
depends on such pattern may not work properly. So it should only be
used if the users really need more than 32k of unique IPC numbers.

This new option does have the side effect of reducing the maximum number
of unique sequence numbers from 64k down to 128. So it is a trade-off.

The computation of a new IPC id is not done in the performance critical
path. So a little bit of additional overhead shouldn't have any real
performance impact.

Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.com
Signed-off-by: Waiman Long
Acked-by: Manfred Spraul
Cc: Al Viro
Cc: Davidlohr Bueso
Cc: "Eric W . Biederman"
Cc: Jonathan Corbet
Cc: Kees Cook
Cc: "Luis R. Rodriguez"
Cc: Matthew Wilcox
Cc: Takashi Iwai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Waiman Long
2019-05-15 10:52:52 +0800
a5091fda4 ipc/mqueue: optimize msg_get() ... Browse Code »

Our msg priorities became an rbtree as of d6629859b36d ("ipc/mqueue:
improve performance of send/recv"). However, consuming a msg in
msg_get() remains logarithmic (still being better than the case before
of course). By applying well known techniques to cache pointers we can
have the node with the highest priority in O(1), which is specially nice
for the rt cases. Furthermore, some callers can call msg_get() in a
loop.

A new msg_tree_erase() helper is also added to encapsulate the tree
removal and node_cache game. Passes ltp mq testcases.

Link: http://lkml.kernel.org/r/20190321190216.1719-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2019-05-15 10:52:52 +0800
0ecb58210 ipc/mqueue: remove redundant wq task assignment ... Browse Code »

We already store the current task fo the new waiter before calling
wq_sleep() in both send and recv paths. Trivially remove the redundant
assignment.

Link: http://lkml.kernel.org/r/20190321190216.1719-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2019-05-15 10:52:52 +0800
d6a2946a8 ipc: prevent lockup on alloc_msg and free_msg ... Browse Code »

msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is
enabled on large memory SMP systems, the pages initialization can take a
long time, if msgctl10 requests a huge block memory, and it will block
rcu scheduler, so release cpu actively.

After adding schedule() in free_msg, free_msg can not be called when
holding spinlock, so adding msg to a tmp list, and free it out of
spinlock

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505
rcu: Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978
rcu: (detected by 11, t=35024 jiffies, g=44237529, q=16542267)
msgctl10 R running task 21608 32505 2794 0x00000082
Call Trace:
preempt_schedule_irq+0x4c/0xb0
retint_kernel+0x1b/0x2d
RIP: 0010:__is_insn_slot_addr+0xfb/0x250
Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48
RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57
RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780
RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3
R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73
R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec
kernel_text_address+0xc1/0x100
__kernel_text_address+0xe/0x30
unwind_get_return_address+0x2f/0x50
__save_stack_trace+0x92/0x100
create_object+0x380/0x650
__kmalloc+0x14c/0x2b0
load_msg+0x38/0x1a0
do_msgsnd+0x19e/0xcf0
do_syscall_64+0x117/0x400
entry_SYSCALL_64_after_hwframe+0x49/0xbe

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170
rcu: (detected by 14, t=35016 jiffies, g=44237525, q=12423063)
msgctl10 R running task 21608 32170 32155 0x00000082
Call Trace:
preempt_schedule_irq+0x4c/0xb0
retint_kernel+0x1b/0x2d
RIP: 0010:lock_acquire+0x4d/0x340
Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82
RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64
RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000
R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
is_bpf_text_address+0x32/0xe0
kernel_text_address+0xec/0x100
__kernel_text_address+0xe/0x30
unwind_get_return_address+0x2f/0x50
__save_stack_trace+0x92/0x100
save_stack+0x32/0xb0
__kasan_slab_free+0x130/0x180
kfree+0xfa/0x2d0
free_msg+0x24/0x50
do_msgrcv+0x508/0xe60
do_syscall_64+0x117/0x400
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Davidlohr said:
"So after releasing the lock, the msg rbtree/list is empty and new
calls will not see those in the newly populated tmp_msg list, and
therefore they cannot access the delayed msg freeing pointers, which
is good. Also the fact that the node_cache is now freed before the
actual messages seems to be harmless as this is wanted for
msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the
info->lock the thing is freed anyway so it should not change things"

Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Li RongQing
Signed-off-by: Zhang Yu
Reviewed-by: Davidlohr Bueso
Cc: Manfred Spraul
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Rongqing
2019-05-15 10:52:52 +0800
e7e6f462c scripts/gdb: print cached rate in lx-clk-summary ... Browse Code »

The clk rate is always stored in clk_core but might be out of date and
require calls to update from hardware.

Deal with that case by printing a (c) suffix.

Link: http://lkml.kernel.org/r/1a474318982a5f0125f2360c4161029b17f56bd1.1556881728.git.leonard.crestez@nxp.com
Signed-off-by: Leonard Crestez
Cc: Jan Kiszka
Cc: Jason Wessel
Cc: Kieran Bingham
Cc: Stephen Boyd
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Leonard Crestez
2019-05-15 10:52:52 +0800
66d5c7c60 scripts/gdb: clean up error handling in list helpers ... Browse Code »

An incorrect argument to list_for_each is an internal error in gdb
scripts so a TypeError should be raised. The gdb.GdbError exception
type is intended for user errors such as incorrect invocation.

Drop the type assertion in list_for_each_entry because list_for_each
isn't going to suddenly yield something else.

Applies to both list and hlist

Link: http://lkml.kernel.org/r/c1d3fd4db13d999a3ba57f5bbc1924862d824f61.1556881728.git.leonard.crestez@nxp.com
Signed-off-by: Leonard Crestez
Reviewed-by: Stephen Boyd
Cc: Jan Kiszka
Cc: Jason Wessel
Cc: Kieran Bingham
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Leonard Crestez
2019-05-15 10:52:52 +0800
988b26861 scripts/gdb: add $lx_clk_core_lookup function ... Browse Code »

Finding an individual clk_core requires walking the tree which can be
quite complicated so add a helper for easy access.

(gdb) print *(struct clk_scu*)$lx_clk_core_lookup("uart0_clk")->hw

Link: http://lkml.kernel.org/r/Message-ID:
Signed-off-by: Leonard Crestez
Cc: Jan Kiszka
Cc: Jason Wessel
Cc: Kieran Bingham
Cc: Stephen Boyd
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Leonard Crestez
2019-05-15 10:52:52 +0800
d1e9710b6 scripts/gdb: initial clk support: lx-clk-summary ... Browse Code »

Add an lx-clk-summary command which prints a subset of
/sys/kernel/debug/clk/clk_summary.

This can be used to examine hangs caused by clk not being enabled.

Link: http://lkml.kernel.org/r/Message-ID:
Signed-off-by: Leonard Crestez
Cc: Jan Kiszka
Cc: Jason Wessel
Cc: Kieran Bingham
Cc: Stephen Boyd
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Leonard Crestez
2019-05-15 10:52:52 +0800
47d0d1285 scripts/gdb: add hlist utilities ... Browse Code »

This allows easily examining kernel hlists in python.

Link: http://lkml.kernel.org/r/Message-ID:
Signed-off-by: Leonard Crestez
Reviewed-by: Stephen Boyd
Cc: Jason Wessel
Cc: Jan Kiszka
Cc: Kieran Bingham
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Leonard Crestez
2019-05-15 10:52:52 +0800
494dbe02b scripts/gdb: silence pep8 checks ... Browse Code »

These scripts have some pep8 style warnings. Fix them up so that this
directory is all pep8 clean.

Link: http://lkml.kernel.org/r/20190329220844.38234-6-swboyd@chromium.org
Signed-off-by: Stephen Boyd
Cc: Douglas Anderson
Cc: Nikolay Borisov
Cc: Kieran Bingham
Cc: Jan Kiszka
Cc: Jackie Liu
Cc: Jason Wessel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Boyd
2019-05-15 10:52:52 +0800
442284a89 scripts/gdb: add a timer list command ... Browse Code »

Implement a command to print the timer list, much like how
/proc/timer_list is implemented. This can be used to look at the
pending timers on a crashed system.

[swboyd@chromium.org: v2]
Link: http://lkml.kernel.org/r/20190329220844.38234-5-swboyd@chromium.org
Link: http://lkml.kernel.org/r/20190325184522.260535-5-swboyd@chromium.org
Signed-off-by: Stephen Boyd
Cc: Douglas Anderson
Cc: Nikolay Borisov
Cc: Kieran Bingham
Cc: Jan Kiszka
Cc: Jackie Liu
Cc: Jason Wessel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Boyd
2019-05-15 10:52:52 +0800
449ca0c95 scripts/gdb: add rb tree iterating utilities ... Browse Code »

Implement gdb functions for rb_first(), rb_last(), rb_next(), and
rb_prev(). These can be useful to iterate through the kernel's
red-black trees.

[swboyd@chromium.org: v2]
Link: http://lkml.kernel.org/r/20190329220844.38234-4-swboyd@chromium.org
Link: http://lkml.kernel.org/r/20190325184522.260535-4-swboyd@chromium.org
Signed-off-by: Stephen Boyd
Cc: Douglas Anderson
Cc: Nikolay Borisov
Cc: Kieran Bingham
Cc: Jan Kiszka
Cc: Jackie Liu
Cc: Jason Wessel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Boyd
2019-05-15 10:52:51 +0800
90cf83dbd scripts/gdb: add kernel config dumping command ... Browse Code »

lx-configdump dumps the contents of the gzipped .config to a text
file when the config is included in the kernel with CONFIG_IKCONFIG. By
default, the file written is called config.txt, but it can be any user
supplied filename as well. If the kernel config is in a module
(configs.ko), then it can be loaded along with symbols for the module
loaded with 'lx-symbols' and then this command will still work.

Obviously if you have the whole vmlinux then this can also be achieved
with scripts/extract-ikconfig, but this gdb script can be useful to
confirm that the memory contents of the config in memory and the vmlinux
contents on disk match what is expected.

[swboyd@chromium.org: v2]
Link: http://lkml.kernel.org/r/20190329220844.38234-3-swboyd@chromium.org
Link: http://lkml.kernel.org/r/20190325184522.260535-3-swboyd@chromium.org
Signed-off-by: Stephen Boyd
Cc: Douglas Anderson
Cc: Nikolay Borisov
Cc: Kieran Bingham
Cc: Jan Kiszka
Cc: Jackie Liu
Cc: Jason Wessel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Boyd
2019-05-15 10:52:51 +0800
dfe4529ee scripts/gdb: find vmlinux where it was before ... Browse Code »

Patch series "gdb script for kconfig and timer list".

This is a handful of changes to the kernel's gdb scripts to do some more
debugging with kgdb. The first patch allows the vmlinux to be reloaded
from where it was specified on the command line so that this set of
scripts can be used from anywhere. The second patch adds a script to
dump the config.gz to a file on the host debugging machine. The third
patch adds some rb tree utilities and the last patch uses those rb tree
walking utilities to dump out the contents of /proc/timer_list from a
system under debug.

This patch (of 5):

If I run 'gdb ' and there's the vmlinux-gdb.py file
there I can properly see symbols and use the lx commands provided by the
GDB scripts. But once I run 'lx-symbols' at the command prompt, gdb
reloads the vmlinux symbols assuming that this script was run from the
directory that has vmlinux at the root. That isn't always true, but we
could just look and see what symbols were already loaded and use that
instead. Let's do that so this can work by being invoked anywhere.

Link: http://lkml.kernel.org/r/20190325184522.260535-2-swboyd@chromium.org
Signed-off-by: Stephen Boyd
Cc: Douglas Anderson
Cc: Nikolay Borisov
Cc: Kieran Bingham
Cc: Jan Kiszka
Cc: Jackie Liu
Cc: Jason Wessel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Boyd
2019-05-15 10:52:51 +0800
4c69add45 pps: pps-gpio PPS ECHO implementation ... Browse Code »

This patch implements the PPS ECHO functionality for pps-gpio, that
sysfs claims is available already.

Configuration is done via device tree bindings.

No changes are made to userspace interfaces.

This patch was originally written by Lukas Senger as part of a masters
thesis project and modified for inclusion into the linux kernel by Tom
Burkart.

Link: http://lkml.kernel.org/r/20190324043305.6627-4-tom@aussec.com
Signed-off-by: Tom Burkart
Acked-by: Rodolfo Giometti
Signed-off-by: Lukas Senger
Cc: Philipp Zabel
Cc: Rob Herring
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tom Burkart
2019-05-15 10:52:51 +0800
652e22185 dt-bindings: pps: pps-gpio PPS ECHO implementation ... Browse Code »

This patch implements the device tree binding changes required for the
PPS ECHO functionality for pps-gpio, that sysfs claims is available
already.

It adds two DT properties for configuring the PPS ECHO functionality.

This patch is provided separated from the rest of the patch per
Documentation/devicetree/bindings/submitting-patches.txt.

This patch was originally written by Lukas Senger as part of a masters
thesis project and modified for inclusion into the linux kernel by Tom
Burkart.

Link: http://lkml.kernel.org/r/20190324043305.6627-3-tom@aussec.com
Signed-off-by: Tom Burkart
Signed-off-by: Lukas Senger
Acked-by: Rodolfo Giometti
Reviewed-by: Rob Herring
Cc: Philipp Zabel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tom Burkart
2019-05-15 10:52:51 +0800
4461d6517 pps: descriptor-based gpio ... Browse Code »

This patch changes the GPIO access for the pps-gpio driver from the
integer based API to the descriptor based API.

The integer based API is considered deprecated and the descriptor based
API is the preferred way to access GPIOs as per
Documentation/driver-api/gpio/intro.rst

No changes are made to userspace interfaces.

Link: http://lkml.kernel.org/r/20190324043305.6627-2-tom@aussec.com
Signed-off-by: Tom Burkart
Acked-by: Rodolfo Giometti
Reviewed-by: Philipp Zabel
Cc: Lukas Senger
Cc: Rob Herring
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tom Burkart
2019-05-15 10:52:51 +0800
b287a25a7 panic/reboot: allow specifying reboot_mode for panic only ... Browse Code »

Allow specifying reboot_mode for panic only. This is needed on systems
where ramoops is used to store panic logs, and user wants to use warm
reset to preserve those, while still having cold reset on normal
reboots.

Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fi
Signed-off-by: Aaro Koskinen
Reviewed-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aaro Koskinen
2019-05-15 10:52:51 +0800
c39ea0b9d panic: avoid the extra noise dmesg ... Browse Code »

When kernel panic happens, it will first print the panic call stack,
then the ending msg like:

[ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[ 35.749975] ------------[ cut here ]------------

The above message are very useful for debugging.

But if system is configured to not reboot on panic, say the
"panic_timeout" parameter equals 0, it will likely print out many noisy
message like WARN() call stack for each and every CPU except the panic
one, messages like below:

WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
Call Trace:

try_to_wake_up
default_wake_function
autoremove_wake_function
__wake_up_common
__wake_up_common_lock
__wake_up
wake_up_klogd_work_func
irq_work_run_list
irq_work_tick
update_process_times
tick_sched_timer
__hrtimer_run_queues
hrtimer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt

For people working in console mode, the screen will first show the panic
call stack, but immediately overridden by these noisy extra messages,
which makes debugging much more difficult, as the original context gets
lost on screen.

Also these noisy messages will confuse some users, as I have seen many bug
reporters posted the noisy message into bugzilla, instead of the real
panic call stack and context.

Adding a flag "suppress_printk" which gets set in panic() to avoid those
noisy messages, without changing current kernel behavior that both panic
blinking and sysrq magic key can work as is, suggested by Petr Mladek.

To verify this, make sure kernel is not configured to reboot on panic and
in console
# echo c > /proc/sysrq-trigger
to see if console only prints out the panic call stack.

Link: http://lkml.kernel.org/r/1551430186-24169-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang
Suggested-by: Petr Mladek
Reviewed-by: Petr Mladek
Acked-by: Steven Rostedt (VMware)
Acked-by: Sergey Senozhatsky
Cc: Thomas Gleixner
Cc: Kees Cook
Cc: Borislav Petkov
Cc: Andi Kleen
Cc: Peter Zijlstra
Cc: Greg Kroah-Hartman
Cc: Jiri Slaby
Cc: Sasha Levin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Feng Tang
2019-05-15 10:52:51 +0800
e178a5beb gcov: clang support ... Browse Code »

LLVM uses profiling data that's deliberately similar to GCC, but has a
very different way of exporting that data. LLVM calls llvm_gcov_init()
once per module, and provides a couple of callbacks that we can use to
ask for more data.

We care about the "writeout" callback, which in turn calls back into
compiler-rt/this module to dump all the gathered coverage data to disk:

llvm_gcda_start_file()
llvm_gcda_emit_function()
llvm_gcda_emit_arcs()
llvm_gcda_emit_function()
llvm_gcda_emit_arcs()
[... repeats for each function ...]
llvm_gcda_summary_info()
llvm_gcda_end_file()

This design is much more stateless and unstructured than gcc's, and is
intended to run at process exit. This forces us to keep some local
state about which module we're dealing with at the moment. On the other
hand, it also means we don't depend as much on how LLVM represents
profiling data internally.

See LLVM's lib/Transforms/Instrumentation/GCOVProfiling.cpp for more
details on how this works, particularly GCOVProfiler::emitProfileArcs(),
GCOVProfiler::insertCounterWriteout(), and GCOVProfiler::insertFlush().

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20190417225328.208129-1-trong@android.com
Signed-off-by: Greg Hackmann
Signed-off-by: Nick Desaulniers
Signed-off-by: Tri Vo
Co-developed-by: Nick Desaulniers
Co-developed-by: Tri Vo
Tested-by: Trilok Soni
Tested-by: Prasad Sodagudi
Tested-by: Tri Vo
Tested-by: Daniel Mentz
Tested-by: Petri Gynther
Reviewed-by: Peter Oberparleiter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Hackmann
2019-05-15 10:52:51 +0800
aa069a23a gcov: docs: add a note on GCC vs Clang differences ... Browse Code »

Document some things of note to gcov users:
1. GCC gcov and Clang llvm-cov tools are not compatible.
2. The use of GCC vs Clang is transparent at build-time.

Also adjust the documentation to account for the removal of config symbol
CONFIG_GCOV_FORMAT_AUTODETECT by commit 6a61b70b43c9 ("gcov: remove
CONFIG_GCOV_FORMAT_AUTODETECT").

Link: http://lkml.kernel.org/r/20190318025411.98014-4-trong@android.com
Signed-off-by: Tri Vo
Reviewed-by: Peter Oberparleiter
Cc: Daniel Mentz
Cc: Greg Hackmann
Cc: Nick Desaulniers
Cc: Petri Gynther
Cc: Prasad Sodagudi
Cc: Trilok Soni
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tri Vo
2019-05-15 10:52:51 +0800
826eba0d7 gcov: clang: move common GCC code into gcc_base.c ... Browse Code »

Patch series "gcov: add Clang support", v4.

This patch (of 3):

base.c contains a few callbacks specific to GCC's gcov implementation.
Move these into their own module in preparation for Clang support.

Link: http://lkml.kernel.org/r/20190318025411.98014-2-trong@android.com
Signed-off-by: Greg Hackmann
Signed-off-by: Nick Desaulniers
Signed-off-by: Tri Vo
Tested-by: Trilok Soni
Tested-by: Prasad Sodagudi
Tested-by: Tri Vo
Reviewed-by: Peter Oberparleiter
Cc: Daniel Mentz
Cc: Petri Gynther
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Hackmann
2019-05-15 10:52:51 +0800
ce528c4c2 fs/eventfd.c: make eventfd_ida static ... Browse Code »

Fix sparse warning:

fs/eventfd.c:26:1: warning:
symbol 'eventfd_ida' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20190413142348.34716-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

YueHaibing
2019-05-15 10:52:51 +0800
b556db17b eventfd: present id to userspace via fdinfo ... Browse Code »

Finding endpoints of an IPC channel is one of essential task to
understand how a user program works. Procfs and netlink socket provide
enough hints to find endpoints for IPC channels like pipes, unix
sockets, and pseudo terminals. However, there is no simple way to find
endpoints for an eventfd file from userland. An inode number doesn't
hint. Unlike pipe, all eventfd files share the same inode object.

To provide the way to find endpoints of an eventfd file, this patch adds
"eventfd-id" field to /proc/PID/fdinfo of eventfd as identifier.
Integers managed by an IDA are used as ids.

A tool like lsof can utilize the information to print endpoints.

Link: http://lkml.kernel.org/r/20190327181823.20222-1-yamato@redhat.com
Signed-off-by: Masatake YAMATO
Cc: Al Viro
Cc: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masatake YAMATO
2019-05-15 10:52:51 +0800
1fd402df4 kernel/pid.c: remove unneeded hash header file ... Browse Code »

Hash functions are not needed since idr is used now. Let's remove hash
header file for cleanup.

Link: http://lkml.kernel.org/r/20190430053319.95913-1-scuttimmy@gmail.com
Signed-off-by: Timmy Li
Cc: "Eric W. Biederman"
Cc: Michal Hocko
Cc: Matthew Wilcox
Cc: Oleg Nesterov
Cc: Mike Rapoport
Cc: KJ Tsanaktsidis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Timmy Li
2019-05-15 10:52:51 +0800
3116ad38f kernel/sysctl.c: fix proc_do_large_bitmap for large input buffers ... Browse Code »

Today, proc_do_large_bitmap() truncates a large write input buffer to
PAGE_SIZE - 1, which may result in misparsed numbers at the (truncated)
end of the buffer. Further, it fails to notify the caller that the
buffer was truncated, so it doesn't get called iteratively to finish the
entire input buffer.

Tell the caller if there's more work to do by adding the skipped amount
back to left/*lenp before returning.

To fix the misparsing, reset the position if we have completely consumed
a truncated buffer (or if just one char is left, which may be a "-" in a
range), and ask the caller to come back for more.

Link: http://lkml.kernel.org/r/20190320222831.8243-7-mcgrof@kernel.org
Signed-off-by: Eric Sandeen
Signed-off-by: Luis Chamberlain
Acked-by: Kees Cook
Cc: Eric Sandeen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2019-05-15 10:52:51 +0800
2ea622b88 tools/testing/selftests/sysctl/sysctl.sh: add proc_do_large_bitmap() test case ... Browse Code »

The kernel has only two users of proc_do_large_bitmap(), the kernel CPU
watchdog, and the ip_local_reserved_ports. Refer to watchdog_cpumask
and ip_local_reserved_ports in Documentation for further details on
these. When you input a large buffer into these, when it is larger than
PAGE_SIZE- 1, the input data gets misparsed, and the user get
incorrectly informed that the desired input value was set. This commit
implements a test which mimics and exploits that use case, it uses a
bitmap size, as in the watchdog case. The bitmap is used to test the
bitmap proc handler, proc_do_large_bitmap().

The next commit fixes this issue.

[akpm@linux-foundation.org: move proc_do_large_bitmap() export to EOF]
[mcgrof@kernel.org: use new target description for backward compatibility]
[mcgrof@kernel.org: augment test number to 50, ran into issues with bash string comparisons when testing up to 50 cases.]
[mcgrof@kernel.org: introduce and use verify_diff_proc_file() to use diff]
[mcgrof@kernel.org: use mktemp for tmp file]
[mcgrof@kernel.org: merge shell test and C code]
[mcgrof@kernel.org: commit log love]
[mcgrof@kernel.org: export proc_do_large_bitmap() to allow for the test
[mcgrof@kernel.org: check for the return value when writing to the proc file]
Link: http://lkml.kernel.org/r/20190320222831.8243-6-mcgrof@kernel.org
Signed-off-by: Eric Sandeen
Signed-off-by: Luis Chamberlain
Acked-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2019-05-15 10:52:51 +0800
a0edef796 tools/testing/selftests/sysctl/sysctl.sh: allow graceful use on older kernels ... Browse Code »

On old kernels older new test knobs implemented on the test_sysctl
module may not be available. This is expected, and the selftests test
scripts should be able to run without failures on older kernels.

Generalize a solution so that we test for each required test target file
for each test by requiring each test description to annotate their
respective test target file. If the target file does not exist, we skip
the test gracefully.

Link: http://lkml.kernel.org/r/20190320222831.8243-5-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain
Acked-by: Kees Cook
Cc: Eric Sandeen
Cc: Eric Sandeen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis Chamberlain
2019-05-15 10:52:51 +0800