15 May, 2019

40 commits

  • Right now, when somebody needs to know the recursive memory statistics
    and events of a cgroup subtree, they need to walk the entire subtree and
    sum up the counters manually.

    There are two issues with this:

    1. When a cgroup gets deleted, its stats are lost. The state counters
    should all be 0 at that point, of course, but the events are not.
    When this happens, the event counters, which are supposed to be
    monotonic, can go backwards in the parent cgroups.

    2. During regular operation, we always have a certain number of lazily
    freed cgroups sitting around that have been deleted, have no tasks,
    but have a few cache pages remaining. These groups' statistics do not
    change until we eventually hit memory pressure, but somebody
    watching, say, memory.stat on an ancestor has to iterate those every
    time.

    This patch addresses both issues by introducing recursive counters at
    each level that are propagated from the write side when stats change.

    Upward propagation happens when the per-cpu caches spill over into the
    local atomic counter. This is the same thing we do during charge and
    uncharge, except that the latter uses atomic RMWs, which are more
    expensive; stat changes happen at around the same rate. In a sparse
    file test (page faults and reclaim at maximum CPU speed) with 5 cgroup
    nesting levels, perf shows __mod_memcg_page state at ~1%.

    Link: http://lkml.kernel.org/r/20190412151507.2769-4-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Reviewed-by: Shakeel Butt
    Reviewed-by: Roman Gushchin
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • These are getting too big to be inlined in every callsite. They were
    stolen from vmstat.c, which already out-of-lines them, and they have
    only been growing since. The callsites aren't that hot, either.

    Move __mod_memcg_state()
    __mod_lruvec_state() and
    __count_memcg_events() out of line and add kerneldoc comments.

    Link: http://lkml.kernel.org/r/20190412151507.2769-3-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Reviewed-by: Shakeel Butt
    Reviewed-by: Roman Gushchin
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Patch series "mm: memcontrol: memory.stat cost & correctness".

    The cgroup memory.stat file holds recursive statistics for the entire
    subtree. The current implementation does this tree walk on-demand
    whenever the file is read. This is giving us problems in production.

    1. The cost of aggregating the statistics on-demand is high. A lot of
    system service cgroups are mostly idle and their stats don't change
    between reads, yet we always have to check them. There are also always
    some lazily-dying cgroups sitting around that are pinned by a handful
    of remaining page cache; the same applies to them.

    In an application that periodically monitors memory.stat in our
    fleet, we have seen the aggregation consume up to 5% CPU time.

    2. When cgroups die and disappear from the cgroup tree, so do their
    accumulated vm events. The result is that the event counters at
    higher-level cgroups can go backwards and confuse some of our
    automation, let alone people looking at the graphs over time.

    To address both issues, this patch series changes the stat
    implementation to spill counts upwards when the counters change.

    The upward spilling is batched using the existing per-cpu cache. In a
    sparse file stress test with 5 level cgroup nesting, the additional cost
    of the flushing was negligible (a little under 1% of CPU at 100% CPU
    utilization, compared to the 5% of reading memory.stat during regular
    operation).

    This patch (of 4):

    memcg_page_state(), lruvec_page_state(), memcg_sum_events() are
    currently returning the state of the local memcg or lruvec, not the
    recursive state.

    In practice there is a demand for both versions, although the callers
    that want the recursive counts currently sum them up by hand.

    Per default, cgroups are considered recursive entities and generally we
    expect more users of the recursive counters, with the local counts being
    special cases. To reflect that in the name, add a _local suffix to the
    current implementations.

    The following patch will re-incarnate these functions with recursive
    semantics, but with an O(1) implementation.

    [hannes@cmpxchg.org: fix bisection hole]
    Link: http://lkml.kernel.org/r/20190417160347.GC23013@cmpxchg.org
    Link: http://lkml.kernel.org/r/20190412151507.2769-2-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Reviewed-by: Shakeel Butt
    Reviewed-by: Roman Gushchin
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The "param.count" value is a u64 thatcomes from the user. The code
    later in the function assumes that param.count is at least one and if
    it's not then it leads to an Oops when we dereference the ZERO_SIZE_PTR.

    Also the addition can have an integer overflow which would lead us to
    allocate a smaller "pages" array than required. I can't immediately
    tell what the possible run times implications are, but it's safest to
    prevent the overflow.

    Link: http://lkml.kernel.org/r/20181218082129.GE32567@kadam
    Fixes: 6db7199407ca ("drivers/virt: introduce Freescale hypervisor management driver")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Andrew Morton
    Cc: Timur Tabi
    Cc: Mihai Caraman
    Cc: Kumar Gala
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • strndup_user() returns error pointers on error, and then in the error
    handling we pass the error pointers to kfree(). It will cause an Oops.

    Link: http://lkml.kernel.org/r/20181218082003.GD32567@kadam
    Fixes: 6db7199407ca ("drivers/virt: introduce Freescale hypervisor management driver")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Andrew Morton
    Cc: Timur Tabi
    Cc: Mihai Caraman
    Cc: Kumar Gala
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • I spent literally an hour trying to work out why an earlier version of
    my memory.events aggregation code doesn't work properly, only to find
    out I was calling memcg->events instead of memcg->memory_events, which
    is fairly confusing.

    This naming seems in need of reworking, so make it harder to do the
    wrong thing by using vmevents instead of events, which makes it more
    clear that these are vm counters rather than memcg-specific counters.

    There are also a few other inconsistent names in both the percpu and
    aggregated structs, so these are all cleaned up to be more coherent and
    easy to understand.

    This commit contains code cleanup only: there are no logic changes.

    [akpm@linux-foundation.org: fix it for preceding changes]
    Link: http://lkml.kernel.org/r/20190208224319.GA23801@chrisdown.name
    Signed-off-by: Chris Down
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Roman Gushchin
    Cc: Dennis Zhou
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Down
     
  • Now that all instances of #include have been replaced with
    #include , we can remove these.

    Link: http://lkml.kernel.org/r/1553267665-27228-2-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • Since commit dccd2304cc90 ("ARM: 7430/1: sizes.h: move from asm-generic
    to "), and are just
    wrappers of .

    This commit replaces all and to
    prepare for the removal.

    Link: http://lkml.kernel.org/r/1553267665-27228-1-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • linux/dax.h is included more than once.

    Link: http://lkml.kernel.org/r/5c867e95.1c69fb81.4f15a.e5e4@mx.google.com
    Signed-off-by: Sabyasachi Gupta
    Acked-by: Souptick Joarder
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sabyasachi Gupta
     
  • linux/xattr.h is included more than once.

    Link: http://lkml.kernel.org/r/5c86803d.1c69fb81.1a7c6.2b78@mx.google.com
    Signed-off-by: Sabyasachi Gupta
    Acked-by: Souptick Joarder
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sabyasachi Gupta
     
  • This file uses "task" 85 times and "tsk" 25 times. It is better to be
    consistent.

    Link: http://lkml.kernel.org/r/20181129180547.15976-1-avagin@gmail.com
    Signed-off-by: Andrei Vagin
    Reviewed-by: Andrew Morton
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrei Vagin
     
  • linux/poll.h is included more than once.

    Link: http://lkml.kernel.org/r/5c86820f.1c69fb81.149f0.0834@mx.google.com
    Signed-off-by: Sabyasachi Gupta
    Acked-by: Souptick Joarder
    Cc: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sabyasachi Gupta
     
  • For ipcmni_extend mode, the sequence number space is only 7 bits. So
    the chance of id reuse is relatively high compared with the non-extended
    mode.

    To alleviate this id reuse problem, this patch enables cyclic allocation
    for the index to the radix tree (idx). The disadvantage is that this
    can cause a slight slow-down of the fast path, as the radix tree could
    be higher than necessary.

    To limit the radix tree height, I have chosen the following limits:
    1) The cycling is done over in_use*1.5.
    2) At least, the cycling is done over
    "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
    "ipcmni_extended": 4096 elements

    Result:
    - for normal mode:
    No change for 4095 active objects until the 3rd level
    is added without cyclic allocation.

    For a 2-level radix tree compared to a 1-level radix tree, I have
    observed < 1% performance impact.

    Notes:
    1) Normal "x=semget();y=semget();" is unaffected: Then the idx
    is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic()
    is used.

    2) The -1% happens in a microbenchmark after this situation:
    x=semget();
    for(i=0;i<
    Acked-by: Waiman Long
    Cc: "Luis R. Rodriguez"
    Cc: Kees Cook
    Cc: Jonathan Corbet
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: "Eric W . Biederman"
    Cc: Takashi Iwai
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Rewrite, based on the patch from Waiman Long:

    The mixing in of a sequence number into the IPC IDs is probably to avoid
    ID reuse in userspace as much as possible. With ipcmni_extend mode, the
    number of usable sequence numbers is greatly reduced leading to higher
    chance of ID reuse.

    To address this issue, we need to conserve the sequence number space as
    much as possible. Right now, the sequence number is incremented for
    every new ID created. In reality, we only need to increment the
    sequence number when new allocated ID is not greater than the last one
    allocated. It is in such case that the new ID may collide with an
    existing one. This is being done irrespective of the ipcmni mode.

    In order to avoid any races, the index is first allocated and then the
    pointer is replaced.

    Changes compared to the initial patch:
    - Handle failures from idr_alloc().
    - Avoid that concurrent operations can see the wrong sequence number.
    (This is achieved by using idr_replace()).
    - IPCMNI_SEQ_SHIFT is not a constant, thus renamed to
    ipcmni_seq_shift().
    - IPCMNI_SEQ_MAX is not a constant, thus renamed to ipcmni_seq_max().

    Link: http://lkml.kernel.org/r/20190329204930.21620-2-longman@redhat.com
    Signed-off-by: Manfred Spraul
    Signed-off-by: Waiman Long
    Suggested-by: Matthew Wilcox
    Acked-by: Waiman Long
    Cc: Al Viro
    Cc: Davidlohr Bueso
    Cc: "Eric W . Biederman"
    Cc: Jonathan Corbet
    Cc: Kees Cook
    Cc: "Luis R. Rodriguez"
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • The maximum number of unique System V IPC identifiers was limited to
    32k. That limit should be big enough for most use cases.

    However, there are some users out there requesting for more, especially
    those that are migrating from Solaris which uses 24 bits for unique
    identifiers. To satisfy the need of those users, a new boot time kernel
    option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This
    is a 512X increase which should be big enough for users out there that
    need a large number of unique IPC identifier.

    The use of this new option will change the pattern of the IPC
    identifiers returned by functions like shmget(2). An application that
    depends on such pattern may not work properly. So it should only be
    used if the users really need more than 32k of unique IPC numbers.

    This new option does have the side effect of reducing the maximum number
    of unique sequence numbers from 64k down to 128. So it is a trade-off.

    The computation of a new IPC id is not done in the performance critical
    path. So a little bit of additional overhead shouldn't have any real
    performance impact.

    Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.com
    Signed-off-by: Waiman Long
    Acked-by: Manfred Spraul
    Cc: Al Viro
    Cc: Davidlohr Bueso
    Cc: "Eric W . Biederman"
    Cc: Jonathan Corbet
    Cc: Kees Cook
    Cc: "Luis R. Rodriguez"
    Cc: Matthew Wilcox
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • Our msg priorities became an rbtree as of d6629859b36d ("ipc/mqueue:
    improve performance of send/recv"). However, consuming a msg in
    msg_get() remains logarithmic (still being better than the case before
    of course). By applying well known techniques to cache pointers we can
    have the node with the highest priority in O(1), which is specially nice
    for the rt cases. Furthermore, some callers can call msg_get() in a
    loop.

    A new msg_tree_erase() helper is also added to encapsulate the tree
    removal and node_cache game. Passes ltp mq testcases.

    Link: http://lkml.kernel.org/r/20190321190216.1719-2-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • We already store the current task fo the new waiter before calling
    wq_sleep() in both send and recv paths. Trivially remove the redundant
    assignment.

    Link: http://lkml.kernel.org/r/20190321190216.1719-1-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is
    enabled on large memory SMP systems, the pages initialization can take a
    long time, if msgctl10 requests a huge block memory, and it will block
    rcu scheduler, so release cpu actively.

    After adding schedule() in free_msg, free_msg can not be called when
    holding spinlock, so adding msg to a tmp list, and free it out of
    spinlock

    rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
    rcu: Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505
    rcu: Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978
    rcu: (detected by 11, t=35024 jiffies, g=44237529, q=16542267)
    msgctl10 R running task 21608 32505 2794 0x00000082
    Call Trace:
    preempt_schedule_irq+0x4c/0xb0
    retint_kernel+0x1b/0x2d
    RIP: 0010:__is_insn_slot_addr+0xfb/0x250
    Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48
    RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
    RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57
    RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780
    RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3
    R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73
    R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec
    kernel_text_address+0xc1/0x100
    __kernel_text_address+0xe/0x30
    unwind_get_return_address+0x2f/0x50
    __save_stack_trace+0x92/0x100
    create_object+0x380/0x650
    __kmalloc+0x14c/0x2b0
    load_msg+0x38/0x1a0
    do_msgsnd+0x19e/0xcf0
    do_syscall_64+0x117/0x400
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
    rcu: Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170
    rcu: (detected by 14, t=35016 jiffies, g=44237525, q=12423063)
    msgctl10 R running task 21608 32170 32155 0x00000082
    Call Trace:
    preempt_schedule_irq+0x4c/0xb0
    retint_kernel+0x1b/0x2d
    RIP: 0010:lock_acquire+0x4d/0x340
    Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82
    RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
    RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64
    RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000
    R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
    is_bpf_text_address+0x32/0xe0
    kernel_text_address+0xec/0x100
    __kernel_text_address+0xe/0x30
    unwind_get_return_address+0x2f/0x50
    __save_stack_trace+0x92/0x100
    save_stack+0x32/0xb0
    __kasan_slab_free+0x130/0x180
    kfree+0xfa/0x2d0
    free_msg+0x24/0x50
    do_msgrcv+0x508/0xe60
    do_syscall_64+0x117/0x400
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Davidlohr said:
    "So after releasing the lock, the msg rbtree/list is empty and new
    calls will not see those in the newly populated tmp_msg list, and
    therefore they cannot access the delayed msg freeing pointers, which
    is good. Also the fact that the node_cache is now freed before the
    actual messages seems to be harmless as this is wanted for
    msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the
    info->lock the thing is freed anyway so it should not change things"

    Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.com
    Signed-off-by: Li RongQing
    Signed-off-by: Zhang Yu
    Reviewed-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Rongqing
     
  • The clk rate is always stored in clk_core but might be out of date and
    require calls to update from hardware.

    Deal with that case by printing a (c) suffix.

    Link: http://lkml.kernel.org/r/1a474318982a5f0125f2360c4161029b17f56bd1.1556881728.git.leonard.crestez@nxp.com
    Signed-off-by: Leonard Crestez
    Cc: Jan Kiszka
    Cc: Jason Wessel
    Cc: Kieran Bingham
    Cc: Stephen Boyd
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Leonard Crestez
     
  • An incorrect argument to list_for_each is an internal error in gdb
    scripts so a TypeError should be raised. The gdb.GdbError exception
    type is intended for user errors such as incorrect invocation.

    Drop the type assertion in list_for_each_entry because list_for_each
    isn't going to suddenly yield something else.

    Applies to both list and hlist

    Link: http://lkml.kernel.org/r/c1d3fd4db13d999a3ba57f5bbc1924862d824f61.1556881728.git.leonard.crestez@nxp.com
    Signed-off-by: Leonard Crestez
    Reviewed-by: Stephen Boyd
    Cc: Jan Kiszka
    Cc: Jason Wessel
    Cc: Kieran Bingham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Leonard Crestez
     
  • Finding an individual clk_core requires walking the tree which can be
    quite complicated so add a helper for easy access.

    (gdb) print *(struct clk_scu*)$lx_clk_core_lookup("uart0_clk")->hw

    Link: http://lkml.kernel.org/r/Message-ID:
    Signed-off-by: Leonard Crestez
    Cc: Jan Kiszka
    Cc: Jason Wessel
    Cc: Kieran Bingham
    Cc: Stephen Boyd
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Leonard Crestez
     
  • Add an lx-clk-summary command which prints a subset of
    /sys/kernel/debug/clk/clk_summary.

    This can be used to examine hangs caused by clk not being enabled.

    Link: http://lkml.kernel.org/r/Message-ID:
    Signed-off-by: Leonard Crestez
    Cc: Jan Kiszka
    Cc: Jason Wessel
    Cc: Kieran Bingham
    Cc: Stephen Boyd
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Leonard Crestez
     
  • This allows easily examining kernel hlists in python.

    Link: http://lkml.kernel.org/r/Message-ID:
    Signed-off-by: Leonard Crestez
    Reviewed-by: Stephen Boyd
    Cc: Jason Wessel
    Cc: Jan Kiszka
    Cc: Kieran Bingham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Leonard Crestez
     
  • These scripts have some pep8 style warnings. Fix them up so that this
    directory is all pep8 clean.

    Link: http://lkml.kernel.org/r/20190329220844.38234-6-swboyd@chromium.org
    Signed-off-by: Stephen Boyd
    Cc: Douglas Anderson
    Cc: Nikolay Borisov
    Cc: Kieran Bingham
    Cc: Jan Kiszka
    Cc: Jackie Liu
    Cc: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     
  • Implement a command to print the timer list, much like how
    /proc/timer_list is implemented. This can be used to look at the
    pending timers on a crashed system.

    [swboyd@chromium.org: v2]
    Link: http://lkml.kernel.org/r/20190329220844.38234-5-swboyd@chromium.org
    Link: http://lkml.kernel.org/r/20190325184522.260535-5-swboyd@chromium.org
    Signed-off-by: Stephen Boyd
    Cc: Douglas Anderson
    Cc: Nikolay Borisov
    Cc: Kieran Bingham
    Cc: Jan Kiszka
    Cc: Jackie Liu
    Cc: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     
  • Implement gdb functions for rb_first(), rb_last(), rb_next(), and
    rb_prev(). These can be useful to iterate through the kernel's
    red-black trees.

    [swboyd@chromium.org: v2]
    Link: http://lkml.kernel.org/r/20190329220844.38234-4-swboyd@chromium.org
    Link: http://lkml.kernel.org/r/20190325184522.260535-4-swboyd@chromium.org
    Signed-off-by: Stephen Boyd
    Cc: Douglas Anderson
    Cc: Nikolay Borisov
    Cc: Kieran Bingham
    Cc: Jan Kiszka
    Cc: Jackie Liu
    Cc: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     
  • lx-configdump dumps the contents of the gzipped .config to a text
    file when the config is included in the kernel with CONFIG_IKCONFIG. By
    default, the file written is called config.txt, but it can be any user
    supplied filename as well. If the kernel config is in a module
    (configs.ko), then it can be loaded along with symbols for the module
    loaded with 'lx-symbols' and then this command will still work.

    Obviously if you have the whole vmlinux then this can also be achieved
    with scripts/extract-ikconfig, but this gdb script can be useful to
    confirm that the memory contents of the config in memory and the vmlinux
    contents on disk match what is expected.

    [swboyd@chromium.org: v2]
    Link: http://lkml.kernel.org/r/20190329220844.38234-3-swboyd@chromium.org
    Link: http://lkml.kernel.org/r/20190325184522.260535-3-swboyd@chromium.org
    Signed-off-by: Stephen Boyd
    Cc: Douglas Anderson
    Cc: Nikolay Borisov
    Cc: Kieran Bingham
    Cc: Jan Kiszka
    Cc: Jackie Liu
    Cc: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     
  • Patch series "gdb script for kconfig and timer list".

    This is a handful of changes to the kernel's gdb scripts to do some more
    debugging with kgdb. The first patch allows the vmlinux to be reloaded
    from where it was specified on the command line so that this set of
    scripts can be used from anywhere. The second patch adds a script to
    dump the config.gz to a file on the host debugging machine. The third
    patch adds some rb tree utilities and the last patch uses those rb tree
    walking utilities to dump out the contents of /proc/timer_list from a
    system under debug.

    This patch (of 5):

    If I run 'gdb ' and there's the vmlinux-gdb.py file
    there I can properly see symbols and use the lx commands provided by the
    GDB scripts. But once I run 'lx-symbols' at the command prompt, gdb
    reloads the vmlinux symbols assuming that this script was run from the
    directory that has vmlinux at the root. That isn't always true, but we
    could just look and see what symbols were already loaded and use that
    instead. Let's do that so this can work by being invoked anywhere.

    Link: http://lkml.kernel.org/r/20190325184522.260535-2-swboyd@chromium.org
    Signed-off-by: Stephen Boyd
    Cc: Douglas Anderson
    Cc: Nikolay Borisov
    Cc: Kieran Bingham
    Cc: Jan Kiszka
    Cc: Jackie Liu
    Cc: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     
  • This patch implements the PPS ECHO functionality for pps-gpio, that
    sysfs claims is available already.

    Configuration is done via device tree bindings.

    No changes are made to userspace interfaces.

    This patch was originally written by Lukas Senger as part of a masters
    thesis project and modified for inclusion into the linux kernel by Tom
    Burkart.

    Link: http://lkml.kernel.org/r/20190324043305.6627-4-tom@aussec.com
    Signed-off-by: Tom Burkart
    Acked-by: Rodolfo Giometti
    Signed-off-by: Lukas Senger
    Cc: Philipp Zabel
    Cc: Rob Herring
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Burkart
     
  • This patch implements the device tree binding changes required for the
    PPS ECHO functionality for pps-gpio, that sysfs claims is available
    already.

    It adds two DT properties for configuring the PPS ECHO functionality.

    This patch is provided separated from the rest of the patch per
    Documentation/devicetree/bindings/submitting-patches.txt.

    This patch was originally written by Lukas Senger as part of a masters
    thesis project and modified for inclusion into the linux kernel by Tom
    Burkart.

    Link: http://lkml.kernel.org/r/20190324043305.6627-3-tom@aussec.com
    Signed-off-by: Tom Burkart
    Signed-off-by: Lukas Senger
    Acked-by: Rodolfo Giometti
    Reviewed-by: Rob Herring
    Cc: Philipp Zabel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Burkart
     
  • This patch changes the GPIO access for the pps-gpio driver from the
    integer based API to the descriptor based API.

    The integer based API is considered deprecated and the descriptor based
    API is the preferred way to access GPIOs as per
    Documentation/driver-api/gpio/intro.rst

    No changes are made to userspace interfaces.

    Link: http://lkml.kernel.org/r/20190324043305.6627-2-tom@aussec.com
    Signed-off-by: Tom Burkart
    Acked-by: Rodolfo Giometti
    Reviewed-by: Philipp Zabel
    Cc: Lukas Senger
    Cc: Rob Herring
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Burkart
     
  • Allow specifying reboot_mode for panic only. This is needed on systems
    where ramoops is used to store panic logs, and user wants to use warm
    reset to preserve those, while still having cold reset on normal
    reboots.

    Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fi
    Signed-off-by: Aaro Koskinen
    Reviewed-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aaro Koskinen
     
  • When kernel panic happens, it will first print the panic call stack,
    then the ending msg like:

    [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception
    [ 35.749975] ------------[ cut here ]------------

    The above message are very useful for debugging.

    But if system is configured to not reboot on panic, say the
    "panic_timeout" parameter equals 0, it will likely print out many noisy
    message like WARN() call stack for each and every CPU except the panic
    one, messages like below:

    WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
    Call Trace:

    try_to_wake_up
    default_wake_function
    autoremove_wake_function
    __wake_up_common
    __wake_up_common_lock
    __wake_up
    wake_up_klogd_work_func
    irq_work_run_list
    irq_work_tick
    update_process_times
    tick_sched_timer
    __hrtimer_run_queues
    hrtimer_interrupt
    smp_apic_timer_interrupt
    apic_timer_interrupt

    For people working in console mode, the screen will first show the panic
    call stack, but immediately overridden by these noisy extra messages,
    which makes debugging much more difficult, as the original context gets
    lost on screen.

    Also these noisy messages will confuse some users, as I have seen many bug
    reporters posted the noisy message into bugzilla, instead of the real
    panic call stack and context.

    Adding a flag "suppress_printk" which gets set in panic() to avoid those
    noisy messages, without changing current kernel behavior that both panic
    blinking and sysrq magic key can work as is, suggested by Petr Mladek.

    To verify this, make sure kernel is not configured to reboot on panic and
    in console
    # echo c > /proc/sysrq-trigger
    to see if console only prints out the panic call stack.

    Link: http://lkml.kernel.org/r/1551430186-24169-1-git-send-email-feng.tang@intel.com
    Signed-off-by: Feng Tang
    Suggested-by: Petr Mladek
    Reviewed-by: Petr Mladek
    Acked-by: Steven Rostedt (VMware)
    Acked-by: Sergey Senozhatsky
    Cc: Thomas Gleixner
    Cc: Kees Cook
    Cc: Borislav Petkov
    Cc: Andi Kleen
    Cc: Peter Zijlstra
    Cc: Greg Kroah-Hartman
    Cc: Jiri Slaby
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Feng Tang
     
  • LLVM uses profiling data that's deliberately similar to GCC, but has a
    very different way of exporting that data. LLVM calls llvm_gcov_init()
    once per module, and provides a couple of callbacks that we can use to
    ask for more data.

    We care about the "writeout" callback, which in turn calls back into
    compiler-rt/this module to dump all the gathered coverage data to disk:

    llvm_gcda_start_file()
    llvm_gcda_emit_function()
    llvm_gcda_emit_arcs()
    llvm_gcda_emit_function()
    llvm_gcda_emit_arcs()
    [... repeats for each function ...]
    llvm_gcda_summary_info()
    llvm_gcda_end_file()

    This design is much more stateless and unstructured than gcc's, and is
    intended to run at process exit. This forces us to keep some local
    state about which module we're dealing with at the moment. On the other
    hand, it also means we don't depend as much on how LLVM represents
    profiling data internally.

    See LLVM's lib/Transforms/Instrumentation/GCOVProfiling.cpp for more
    details on how this works, particularly GCOVProfiler::emitProfileArcs(),
    GCOVProfiler::insertCounterWriteout(), and GCOVProfiler::insertFlush().

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20190417225328.208129-1-trong@android.com
    Signed-off-by: Greg Hackmann
    Signed-off-by: Nick Desaulniers
    Signed-off-by: Tri Vo
    Co-developed-by: Nick Desaulniers
    Co-developed-by: Tri Vo
    Tested-by: Trilok Soni
    Tested-by: Prasad Sodagudi
    Tested-by: Tri Vo
    Tested-by: Daniel Mentz
    Tested-by: Petri Gynther
    Reviewed-by: Peter Oberparleiter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Hackmann
     
  • Document some things of note to gcov users:
    1. GCC gcov and Clang llvm-cov tools are not compatible.
    2. The use of GCC vs Clang is transparent at build-time.

    Also adjust the documentation to account for the removal of config symbol
    CONFIG_GCOV_FORMAT_AUTODETECT by commit 6a61b70b43c9 ("gcov: remove
    CONFIG_GCOV_FORMAT_AUTODETECT").

    Link: http://lkml.kernel.org/r/20190318025411.98014-4-trong@android.com
    Signed-off-by: Tri Vo
    Reviewed-by: Peter Oberparleiter
    Cc: Daniel Mentz
    Cc: Greg Hackmann
    Cc: Nick Desaulniers
    Cc: Petri Gynther
    Cc: Prasad Sodagudi
    Cc: Trilok Soni
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tri Vo
     
  • Patch series "gcov: add Clang support", v4.

    This patch (of 3):

    base.c contains a few callbacks specific to GCC's gcov implementation.
    Move these into their own module in preparation for Clang support.

    Link: http://lkml.kernel.org/r/20190318025411.98014-2-trong@android.com
    Signed-off-by: Greg Hackmann
    Signed-off-by: Nick Desaulniers
    Signed-off-by: Tri Vo
    Tested-by: Trilok Soni
    Tested-by: Prasad Sodagudi
    Tested-by: Tri Vo
    Reviewed-by: Peter Oberparleiter
    Cc: Daniel Mentz
    Cc: Petri Gynther
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Hackmann
     
  • Fix sparse warning:

    fs/eventfd.c:26:1: warning:
    symbol 'eventfd_ida' was not declared. Should it be static?

    Link: http://lkml.kernel.org/r/20190413142348.34716-1-yuehaibing@huawei.com
    Signed-off-by: YueHaibing
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    YueHaibing
     
  • Finding endpoints of an IPC channel is one of essential task to
    understand how a user program works. Procfs and netlink socket provide
    enough hints to find endpoints for IPC channels like pipes, unix
    sockets, and pseudo terminals. However, there is no simple way to find
    endpoints for an eventfd file from userland. An inode number doesn't
    hint. Unlike pipe, all eventfd files share the same inode object.

    To provide the way to find endpoints of an eventfd file, this patch adds
    "eventfd-id" field to /proc/PID/fdinfo of eventfd as identifier.
    Integers managed by an IDA are used as ids.

    A tool like lsof can utilize the information to print endpoints.

    Link: http://lkml.kernel.org/r/20190327181823.20222-1-yamato@redhat.com
    Signed-off-by: Masatake YAMATO
    Cc: Al Viro
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masatake YAMATO
     
  • Hash functions are not needed since idr is used now. Let's remove hash
    header file for cleanup.

    Link: http://lkml.kernel.org/r/20190430053319.95913-1-scuttimmy@gmail.com
    Signed-off-by: Timmy Li
    Cc: "Eric W. Biederman"
    Cc: Michal Hocko
    Cc: Matthew Wilcox
    Cc: Oleg Nesterov
    Cc: Mike Rapoport
    Cc: KJ Tsanaktsidis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Timmy Li
     
  • Today, proc_do_large_bitmap() truncates a large write input buffer to
    PAGE_SIZE - 1, which may result in misparsed numbers at the (truncated)
    end of the buffer. Further, it fails to notify the caller that the
    buffer was truncated, so it doesn't get called iteratively to finish the
    entire input buffer.

    Tell the caller if there's more work to do by adding the skipped amount
    back to left/*lenp before returning.

    To fix the misparsing, reset the position if we have completely consumed
    a truncated buffer (or if just one char is left, which may be a "-" in a
    range), and ask the caller to come back for more.

    Link: http://lkml.kernel.org/r/20190320222831.8243-7-mcgrof@kernel.org
    Signed-off-by: Eric Sandeen
    Signed-off-by: Luis Chamberlain
    Acked-by: Kees Cook
    Cc: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen