01 Oct, 2020

2 commits

  • [ Upstream commit f643ee295c1c63bc117fb052d4da681354d6f732 ]

    The original patch bringed in the "SCTP ACK tracking trace event"
    feature was committed at Dec.20, 2017, it replaced jprobe usage
    with trace events, and bringed in two trace events, one is
    TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path).
    The original patch intended to trigger the trace_sctp_probe_path in
    TRACE_EVENT(sctp_probe) as below code,

    +TRACE_EVENT(sctp_probe,
    +
    + TP_PROTO(const struct sctp_endpoint *ep,
    + const struct sctp_association *asoc,
    + struct sctp_chunk *chunk),
    +
    + TP_ARGS(ep, asoc, chunk),
    +
    + TP_STRUCT__entry(
    + __field(__u64, asoc)
    + __field(__u32, mark)
    + __field(__u16, bind_port)
    + __field(__u16, peer_port)
    + __field(__u32, pathmtu)
    + __field(__u32, rwnd)
    + __field(__u16, unack_data)
    + ),
    +
    + TP_fast_assign(
    + struct sk_buff *skb = chunk->skb;
    +
    + __entry->asoc = (unsigned long)asoc;
    + __entry->mark = skb->mark;
    + __entry->bind_port = ep->base.bind_addr.port;
    + __entry->peer_port = asoc->peer.port;
    + __entry->pathmtu = asoc->pathmtu;
    + __entry->rwnd = asoc->peer.rwnd;
    + __entry->unack_data = asoc->unack_data;
    +
    + if (trace_sctp_probe_path_enabled()) {
    + struct sctp_transport *sp;
    +
    + list_for_each_entry(sp, &asoc->peer.transport_addr_list,
    + transports) {
    + trace_sctp_probe_path(sp, asoc);
    + }
    + }
    + ),

    But I found it did not work when I did testing, and trace_sctp_probe_path
    had no output, I finally found that there is trace buffer lock
    operation(trace_event_buffer_reserve) in include/trace/trace_events.h:

    static notrace void \
    trace_event_raw_event_##call(void *__data, proto) \
    { \
    struct trace_event_file *trace_file = __data; \
    struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
    struct trace_event_buffer fbuffer; \
    struct trace_event_raw_##call *entry; \
    int __data_size; \
    \
    if (trace_trigger_soft_disabled(trace_file)) \
    return; \
    \
    __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
    \
    entry = trace_event_buffer_reserve(&fbuffer, trace_file, \
    sizeof(*entry) + __data_size); \
    \
    if (!entry) \
    return; \
    \
    tstruct \
    \
    { assign; } \
    \
    trace_event_buffer_commit(&fbuffer); \
    }

    The reason caused no output of trace_sctp_probe_path is that
    trace_sctp_probe_path written in TP_fast_assign part of
    TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the
    trace_event_buffer_reserve() when compiler expands Macro,

    entry = trace_event_buffer_reserve(&fbuffer, trace_file, \
    sizeof(*entry) + __data_size); \
    \
    if (!entry) \
    return; \
    \
    tstruct \
    \
    { assign; } \

    so trace_sctp_probe_path finally can not acquire trace_event_buffer
    and return no output, that is to say the nest of tracepoint entry function
    is not allowed. The function call flow is:

    trace_sctp_probe()
    -> trace_event_raw_event_sctp_probe()
    -> lock buffer
    -> trace_sctp_probe_path()
    -> trace_event_raw_event_sctp_probe_path() --nested
    -> buffer has been locked and return no output.

    This patch is to remove trace_sctp_probe_path from the TP_fast_assign
    part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function,
    and trigger sctp_probe_path_trace in sctp_outq_sack.

    After this patch, you can enable both events individually,
    # cd /sys/kernel/debug/tracing
    # echo 1 > events/sctp/sctp_probe/enable
    # echo 1 > events/sctp/sctp_probe_path/enable

    Or, you can enable all the events under sctp.

    # echo 1 > events/sctp/enable

    Signed-off-by: Kevin Kou
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kevin Kou
     
  • [ Upstream commit a264abad51d8ecb7954a2f6d9f1885b38daffc74 ]

    RPC tasks on the backchannel never invoke xprt_complete_rqst(), so
    there is no way to report their tk_status at completion. Also, any
    RPC task that exits via rpc_exit_task() before it is replied to will
    also disappear without a trace.

    Introduce a trace point that is symmetrical with rpc_task_begin that
    captures the termination status of each RPC task.

    Sample trace output for callback requests initiated on the server:
    kworker/u8:12-448 [003] 127.025240: rpc_task_end: task:50@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
    kworker/u8:12-448 [002] 127.567310: rpc_task_end: task:51@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
    kworker/u8:12-448 [001] 130.506817: rpc_task_end: task:52@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task

    Odd, though, that I never see trace_rpc_task_complete, either in the
    forward or backchannel. Should it be removed?

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust
    Signed-off-by: Sasha Levin

    Chuck Lever
     

03 Sep, 2020

1 commit

  • commit f9cae926f35e8230330f28c7b743ad088611a8de upstream.

    When we are processing writeback for sync(2), move_expired_inodes()
    didn't set any inode expiry value (older_than_this). This can result in
    writeback never completing if there's steady stream of inodes added to
    b_dirty_time list as writeback rechecks dirty lists after each writeback
    round whether there's more work to be done. Fix the problem by using
    sync(2) start time is inode expiry value when processing b_dirty_time
    list similarly as for ordinarily dirtied inodes. This requires some
    refactoring of older_than_this handling which simplifies the code
    noticeably as a bonus.

    Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
    CC: stable@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

22 Jul, 2020

1 commit

  • commit aadf9dcef9d4cd68c73a4ab934f93319c4becc47 upstream.

    The trace symbol printer (__print_symbolic()) ignores symbols that map to
    an empty string and prints the hex value instead.

    Fix the symbol for rxrpc_cong_no_change to " -" instead of "" to avoid
    this.

    Fixes: b54a134a7de4 ("rxrpc: Fix handling of enums-to-string translation in tracing")
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

27 May, 2020

2 commits

  • [ Upstream commit d1f129470e6cb79b8b97fecd12689f6eb49e27fe ]

    Add a tracepoint to track received ACKs that are discarded due to being
    outside of the Tx window.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin

    David Howells
     
  • commit c410bf01933e5e09d142c66c3df9ad470a7eec13 upstream.

    rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
    sufficiently sampled. This can cause problems with some fileservers with
    calls to the cache manager in the afs filesystem being dropped from the
    fileserver because a packet goes missing and the retransmission timeout is
    greater than the call expiry timeout.

    Fix this by:

    (1) Copying the RTT/RTO calculation code from Linux's TCP implementation
    and altering it to fit rxrpc.

    (2) Altering the various users of the RTT to make use of the new SRTT
    value.

    (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
    value instead (which is needed in jiffies), along with a backoff.

    Notes:

    (1) rxrpc provides RTT samples by matching the serial numbers on outgoing
    DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
    against the reference serial number in incoming REQUESTED ACK and
    PING-RESPONSE ACK packets.

    (2) Each packet that is transmitted on an rxrpc connection gets a new
    per-connection serial number, even for retransmissions, so an ACK can
    be cross-referenced to a specific trigger packet. This allows RTT
    information to be drawn from retransmitted DATA packets also.

    (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
    on an rxrpc_call because many RPC calls won't live long enough to
    generate more than one sample.

    (4) The calculated SRTT value is in units of 8ths of a microsecond rather
    than nanoseconds.

    The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.

    Fixes: 17926a79320a ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

02 May, 2020

2 commits

  • commit d6c8e949a35d6906d6c03a50e9a9cdf4e494528a upstream.

    Systemtap 4.2 is unable to correctly interpret the "u32 (*missed_ppm)[2]"
    argument of the iocost_ioc_vrate_adj trace entry defined in
    include/trace/events/iocost.h leading to the following error:

    /tmp/stapAcz0G0/stap_c89c58b83cea1724e26395efa9ed4939_6321_aux_6.c:78:8:
    error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
    , u32[]* __tracepoint_arg_missed_ppm

    That argument type is indeed rather complex and hard to read. Looking
    at block/blk-iocost.c. It is just a 2-entry u32 array. By simplifying
    the argument to a simple "u32 *missed_ppm" and adjusting the trace
    entry accordingly, the compilation error was gone.

    Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
    Acked-by: Steven Rostedt (VMware)
    Acked-by: Tejun Heo
    Signed-off-by: Waiman Long
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     
  • commit e28b4fc652c1830796a4d3e09565f30c20f9a2cf upstream.

    I hit this while testing nfsd-5.7 with kernel memory debugging
    enabled on my server:

    Mar 30 13:21:45 klimt kernel: BUG: unable to handle page fault for address: ffff8887e6c279a8
    Mar 30 13:21:45 klimt kernel: #PF: supervisor read access in kernel mode
    Mar 30 13:21:45 klimt kernel: #PF: error_code(0x0000) - not-present page
    Mar 30 13:21:45 klimt kernel: PGD 3601067 P4D 3601067 PUD 87c519067 PMD 87c3e2067 PTE 800ffff8193d8060
    Mar 30 13:21:45 klimt kernel: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
    Mar 30 13:21:45 klimt kernel: CPU: 2 PID: 1933 Comm: nfsd Not tainted 5.6.0-rc6-00040-g881e87a3c6f9 #1591
    Mar 30 13:21:45 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
    Mar 30 13:21:45 klimt kernel: RIP: 0010:svc_rdma_post_chunk_ctxt+0xab/0x284 [rpcrdma]
    Mar 30 13:21:45 klimt kernel: Code: c1 83 34 02 00 00 29 d0 85 c0 7e 72 48 8b bb a0 02 00 00 48 8d 54 24 08 4c 89 e6 48 8b 07 48 8b 40 20 e8 5a 5c 2b e1 41 89 c6 45 20 89 44 24 04 8b 05 02 e9 01 00 85 c0 7e 33 e9 5e 01 00 00
    Mar 30 13:21:45 klimt kernel: RSP: 0018:ffffc90000dfbdd8 EFLAGS: 00010286
    Mar 30 13:21:45 klimt kernel: RAX: 0000000000000000 RBX: ffff8887db8db400 RCX: 0000000000000030
    Mar 30 13:21:45 klimt kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000246
    Mar 30 13:21:45 klimt kernel: RBP: ffff8887e6c27988 R08: 0000000000000000 R09: 0000000000000004
    Mar 30 13:21:45 klimt kernel: R10: ffffc90000dfbdd8 R11: 00c068ef00000000 R12: ffff8887eb4e4a80
    Mar 30 13:21:45 klimt kernel: R13: ffff8887db8db634 R14: 0000000000000000 R15: ffff8887fc931000
    Mar 30 13:21:45 klimt kernel: FS: 0000000000000000(0000) GS:ffff88885bd00000(0000) knlGS:0000000000000000
    Mar 30 13:21:45 klimt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8 CR3: 000000081b72e002 CR4: 00000000001606e0
    Mar 30 13:21:45 klimt kernel: Call Trace:
    Mar 30 13:21:45 klimt kernel: ? svc_rdma_vec_to_sg+0x7f/0x7f [rpcrdma]
    Mar 30 13:21:45 klimt kernel: svc_rdma_send_write_chunk+0x59/0xce [rpcrdma]
    Mar 30 13:21:45 klimt kernel: svc_rdma_sendto+0xf9/0x3ae [rpcrdma]
    Mar 30 13:21:45 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
    Mar 30 13:21:45 klimt kernel: svc_send+0x105/0x1e3 [sunrpc]
    Mar 30 13:21:45 klimt kernel: nfsd+0xf2/0x149 [nfsd]
    Mar 30 13:21:45 klimt kernel: kthread+0xf6/0xfb
    Mar 30 13:21:45 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
    Mar 30 13:21:45 klimt kernel: ret_from_fork+0x3a/0x50
    Mar 30 13:21:45 klimt kernel: Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801 rdma_ucm lpc_ich mfd_core ib_iser rdma_cm iw_cm ib_cm mei_me raid0 libiscsi mei sg scsi_transport_iscsi ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sd_mod sr_mod cdrom mlx4_core crc32c_intel igb nvme i2c_algo_bit ahci i2c_core libahci nvme_core dca libata t10_pi qedr dm_mirror dm_region_hash dm_log dm_mod dax qede qed crc8 ib_uverbs ib_core
    Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8
    Mar 30 13:21:45 klimt kernel: ---[ end trace 87971d2ad3429424 ]---

    It's absolutely not safe to use resources pointed to by the @send_wr
    argument of ib_post_send() _after_ that function returns. Those
    resources are typically freed by the Send completion handler, which
    can run before ib_post_send() returns.

    Thus the trace points currently around ib_post_send() in the
    server's RPC/RDMA transport are a hazard, even when they are
    disabled. Rearrange them so that they touch the Work Request only
    _before_ ib_post_send() is invoked.

    Fixes: bd2abef33394 ("svcrdma: Trace key RDMA API events")
    Fixes: 4201c7464753 ("svcrdma: Introduce svc_rdma_send_ctxt")
    Signed-off-by: Chuck Lever
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

01 Apr, 2020

1 commit

  • commit 4636cf184d6d9a92a56c2554681ea520dd4fe49a upstream.

    Fix a couple of tracelines to indicate the usage count after the atomic op,
    not the usage count before it to be consistent with other afs and rxrpc
    trace lines.

    Change the wording of the afs_call_trace_work trace ID label from "WORK" to
    "QUEUE" to reflect the fact that it's queueing work, not doing work.

    Fixes: 341f741f04be ("afs: Refcount the afs_call struct")
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

24 Feb, 2020

1 commit

  • [ Upstream commit 6cf539a87a61a4fbc43f625267dbcbcf283872ed ]

    This fixes a data-race where `atomic_t dynticks` is copied by value. The
    copy is performed non-atomically, resulting in a data-race if `dynticks`
    is updated concurrently.

    This data-race was found with KCSAN:
    ==================================================================
    BUG: KCSAN: data-race in dyntick_save_progress_counter / rcu_irq_enter

    write to 0xffff989dbdbe98e0 of 4 bytes by task 10 on cpu 3:
    atomic_add_return include/asm-generic/atomic-instrumented.h:78 [inline]
    rcu_dynticks_snap kernel/rcu/tree.c:310 [inline]
    dyntick_save_progress_counter+0x43/0x1b0 kernel/rcu/tree.c:984
    force_qs_rnp+0x183/0x200 kernel/rcu/tree.c:2286
    rcu_gp_fqs kernel/rcu/tree.c:1601 [inline]
    rcu_gp_fqs_loop+0x71/0x880 kernel/rcu/tree.c:1653
    rcu_gp_kthread+0x22c/0x3b0 kernel/rcu/tree.c:1799
    kthread+0x1b5/0x200 kernel/kthread.c:255

    read to 0xffff989dbdbe98e0 of 4 bytes by task 154 on cpu 7:
    rcu_nmi_enter_common kernel/rcu/tree.c:828 [inline]
    rcu_irq_enter+0xda/0x240 kernel/rcu/tree.c:870
    irq_enter+0x5/0x50 kernel/softirq.c:347

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 7 PID: 154 Comm: kworker/7:1H Not tainted 5.3.0+ #5
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    Workqueue: kblockd blk_mq_run_work_fn
    ==================================================================

    Signed-off-by: Marco Elver
    Cc: Paul E. McKenney
    Cc: Josh Triplett
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Cc: Joel Fernandes
    Cc: Ingo Molnar
    Cc: Dmitry Vyukov
    Cc: rcu@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Sasha Levin

    Marco Elver
     

11 Feb, 2020

1 commit

  • commit 68f23b89067fdf187763e75a56087550624fdbee upstream.

    Without memcg, there is a one-to-one mapping between the bdi and
    bdi_writeback structures. In this world, things are fairly
    straightforward; the first thing bdi_unregister() does is to shutdown
    the bdi_writeback structure (or wb), and part of that writeback ensures
    that no other work queued against the wb, and that the wb is fully
    drained.

    With memcg, however, there is a one-to-many relationship between the bdi
    and bdi_writeback structures; that is, there are multiple wb objects
    which can all point to a single bdi. There is a refcount which prevents
    the bdi object from being released (and hence, unregistered). So in
    theory, the bdi_unregister() *should* only get called once its refcount
    goes to zero (bdi_put will drop the refcount, and when it is zero,
    release_bdi gets called, which calls bdi_unregister).

    Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about
    the Brave New memcg World, and calls bdi_unregister directly. It does
    this without informing the file system, or the memcg code, or anything
    else. This causes the root wb associated with the bdi to be
    unregistered, but none of the memcg-specific wb's are shutdown. So when
    one of these wb's are woken up to do delayed work, they try to
    dereference their wb->bdi->dev to fetch the device name, but
    unfortunately bdi->dev is now NULL, thanks to the bdi_unregister()
    called by del_gendisk(). As a result, *boom*.

    Fortunately, it looks like the rest of the writeback path is perfectly
    happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to
    create a bdi_dev_name() function which can handle bdi->dev being NULL.
    This also allows us to bulletproof the writeback tracepoints to prevent
    them from dereferencing a NULL pointer and crashing the kernel if one is
    tracing with memcg's enabled, and an iSCSI device dies or a USB storage
    stick is pulled.

    The most common way of triggering this will be hotremoval of a device
    while writeback with memcg enabled is going on. It was triggering
    several times a day in a heavily loaded production environment.

    Google Bug Id: 145475544

    Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu
    Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu
    Signed-off-by: Theodore Ts'o
    Cc: Chris Mason
    Cc: Tejun Heo
    Cc: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

29 Jan, 2020

1 commit

  • commit d0695e2351102affd8efae83989056bc4b275917 upstream.

    Just as commit 0566e40ce7 ("tracing: initcall: Ordered comparison of
    function pointers"), this patch fixes another remaining one in xen.h
    found by clang-9.

    In file included from arch/x86/xen/trace.c:21:
    In file included from ./include/trace/events/xen.h:475:
    In file included from ./include/trace/define_trace.h:102:
    In file included from ./include/trace/trace_events.h:473:
    ./include/trace/events/xen.h:69:7: warning: ordered comparison of function \
    pointers ('xen_mc_callback_fn_t' (aka 'void (*)(void *)') and 'xen_mc_callback_fn_t') [-Wordered-compare-function-pointers]
    __field(xen_mc_callback_fn_t, fn)
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ./include/trace/trace_events.h:421:29: note: expanded from macro '__field'
    ^
    ./include/trace/trace_events.h:407:6: note: expanded from macro '__field_ext'
    is_signed_type(type), filter_type); \
    ^
    ./include/linux/trace_events.h:554:44: note: expanded from macro 'is_signed_type'
    ^

    Fixes: c796f213a6934 ("xen/trace: add multicall tracing")
    Signed-off-by: Changbin Du
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Changbin Du
     

23 Jan, 2020

1 commit

  • commit 554913f600b45d73de12ad58c1ac7baa0f22a703 upstream.

    Commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem)
    FS") introduced a new khugepaged scan result: SCAN_PAGE_HAS_PRIVATE, but
    the corresponding description for trace events were not added.

    Link: http://lkml.kernel.org/r/1574793844-2914-1-git-send-email-yang.shi@linux.alibaba.com
    Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
    Signed-off-by: Yang Shi
    Cc: Song Liu
    Cc: Kirill A. Shutemov
    Cc: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Yang Shi
     

18 Jan, 2020

2 commits

  • commit 40a708bd622b78582ae3d280de29b09b50bd04c0 upstream.

    afs_lookup() has a tracepoint to indicate the outcome of
    d_splice_alias(), passing it the inode to retrieve the fid from.
    However, the function gave up its ref on that inode when it called
    d_splice_alias(), which may have failed and dropped the inode.

    Fix this by caching the fid.

    Fixes: 80548b03991f ("afs: Add more tracepoints")
    Reported-by: Al Viro
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • commit 4b93dab36f28e673725e5e6123ebfccf7697f96a upstream.

    When adding frwr_unmap_async way back when, I re-used the existing
    trace_xprtrdma_post_send() trace point to record the return code
    of ib_post_send.

    Unfortunately there are some cases where re-using that trace point
    causes a crash. Instead, construct a trace point specific to posting
    Local Invalidate WRs that will always be safe to use in that context,
    and will act as a trace log eye-catcher for Local Invalidation.

    Fixes: 847568942f93 ("xprtrdma: Remove fr_state")
    Fixes: d8099feda483 ("xprtrdma: Reduce context switching due ... ")
    Signed-off-by: Chuck Lever
    Tested-by: Bill Baker
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

15 Jan, 2020

1 commit

  • commit bf44f488e168368cae4139b4b33c3d0aaa11679c upstream.

    Discussion in the below link reported that symbols in modules can appear
    to be before _stext on ARM architecture, causing wrapping with the
    offsets of this tracepoint. Change the offset type to s32 to fix this.

    Link: http://lore.kernel.org/r/20191127154428.191095-1-antonio.borneo@st.com
    Link: http://lkml.kernel.org/r/20200102194625.226436-1-joel@joelfernandes.org

    Cc: Bjorn Helgaas
    Cc: David Sterba
    Cc: Ingo Molnar
    Cc: Mike Rapoport
    Cc: "Rafael J. Wysocki"
    Cc: Sakari Ailus
    Cc: Antonio Borneo
    Cc: stable@vger.kernel.org
    Fixes: d59158162e032 ("tracing: Add support for preempt and irq enable/disable events")
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Joel Fernandes (Google)
     

31 Dec, 2019

1 commit

  • [ Upstream commit 1d200e9d6f635ae894993a7d0f1b9e0b6e522e3b ]

    Fix the following compiler warnings:

    In file included from ./include/linux/bitmap.h:9,
    from ./include/linux/cpumask.h:12,
    from ./arch/x86/include/asm/cpumask.h:5,
    from ./arch/x86/include/asm/msr.h:11,
    from ./arch/x86/include/asm/processor.h:21,
    from ./arch/x86/include/asm/cpufeature.h:5,
    from ./arch/x86/include/asm/thread_info.h:53,
    from ./include/linux/thread_info.h:38,
    from ./arch/x86/include/asm/preempt.h:7,
    from ./include/linux/preempt.h:78,
    from ./include/linux/spinlock.h:51,
    from ./include/linux/mmzone.h:8,
    from ./include/linux/gfp.h:6,
    from ./include/linux/mm.h:10,
    from ./include/linux/bvec.h:13,
    from ./include/linux/blk_types.h:10,
    from block/blk-wbt.c:23:
    In function 'strncpy',
    inlined from 'perf_trace_wbt_stat' at ./include/trace/events/wbt.h:15:1:
    ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
    return __builtin_strncpy(p, q, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'strncpy',
    inlined from 'perf_trace_wbt_lat' at ./include/trace/events/wbt.h:58:1:
    ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
    return __builtin_strncpy(p, q, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'strncpy',
    inlined from 'perf_trace_wbt_step' at ./include/trace/events/wbt.h:87:1:
    ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
    return __builtin_strncpy(p, q, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'strncpy',
    inlined from 'perf_trace_wbt_timer' at ./include/trace/events/wbt.h:126:1:
    ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
    return __builtin_strncpy(p, q, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'strncpy',
    inlined from 'trace_event_raw_event_wbt_stat' at ./include/trace/events/wbt.h:15:1:
    ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
    return __builtin_strncpy(p, q, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'strncpy',
    inlined from 'trace_event_raw_event_wbt_lat' at ./include/trace/events/wbt.h:58:1:
    ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
    return __builtin_strncpy(p, q, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'strncpy',
    inlined from 'trace_event_raw_event_wbt_timer' at ./include/trace/events/wbt.h:126:1:
    ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
    return __builtin_strncpy(p, q, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'strncpy',
    inlined from 'trace_event_raw_event_wbt_step' at ./include/trace/events/wbt.h:87:1:
    ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
    return __builtin_strncpy(p, q, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Cc: Christoph Hellwig
    Cc: Ming Lei
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism"; v4.10).
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Bart Van Assche
     

18 Dec, 2019

1 commit

  • [ Upstream commit c3f812cea0d7006469d1cf33a4a9f0a12bb4b3a3 ]

    The page pool keeps track of the number of pages in flight, and
    it isn't safe to remove the pool until all pages are returned.

    Disallow removing the pool until all pages are back, so the pool
    is always available for page producers.

    Make the page pool responsible for its own delayed destruction
    instead of relying on XDP, so the page pool can be used without
    the xdp memory model.

    When all pages are returned, free the pool and notify xdp if the
    pool is registered with the xdp memory system. Have the callback
    perform a table walk since some drivers (cpsw) may share the pool
    among multiple xdp_rxq_info.

    Note that the increment of pages_state_release_cnt may result in
    inflight == 0, resulting in the pool being released.

    Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning")
    Signed-off-by: Jonathan Lemon
    Acked-by: Jesper Dangaard Brouer
    Acked-by: Ilias Apalodimas
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jonathan Lemon
     

10 Nov, 2019

1 commit

  • This removes '\n' from trace event class tcp_event_sk_skb to avoid
    redundant new blank line and make output compact.

    Fixes: af4325ecc24f ("tcp: expose sk_state in tcp_retransmit_skb tracepoint")
    Reviewed-by: Eric Dumazet
    Reviewed-by: Yafang Shao
    Signed-off-by: Tony Lu
    Signed-off-by: David S. Miller

    Tony Lu
     

23 Oct, 2019

1 commit

  • Pull btrfs fixes from David Sterba:

    - fixes of error handling cleanup of metadata accounting with qgroups
    enabled

    - fix swapped values for qgroup tracepoints

    - fix race when handling full sync flag

    - don't start unused worker thread, functionality removed already

    * tag 'for-5.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    Btrfs: check for the full sync flag while holding the inode lock during fsync
    Btrfs: fix qgroup double free after failure to reserve metadata for delalloc
    btrfs: tracepoints: Fix bad entry members of qgroup events
    btrfs: tracepoints: Fix wrong parameter order for qgroup events
    btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents()
    btrfs: don't needlessly create extent-refs kernel thread
    btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
    Btrfs: add missing extents release on file extent cluster relocation error

    Linus Torvalds
     

17 Oct, 2019

1 commit

  • [BUG]
    For btrfs:qgroup_meta_reserve event, the trace event can output garbage:

    qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2
    qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=0x258792 diff=2

    The @type can be completely garbage, as DATA type is not possible for
    trace_qgroup_meta_reserve() trace event.

    [CAUSE]
    Ther are several problems related to qgroup trace events:
    - Unassigned entry member
    Member entry::type of trace_qgroup_update_reserve() and
    trace_qgourp_meta_reserve() is not assigned

    - Redundant entry member
    Member entry::type is completely useless in
    trace_qgroup_meta_convert()

    Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation")
    CC: stable@vger.kernel.org # 4.10+
    Reviewed-by: Nikolay Borisov
    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Qu Wenruo
     

14 Oct, 2019

2 commits

  • For the sake of tcp_poll(), there are few places where we fetch
    sk->sk_wmem_queued while this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make sure write
    sides use corresponding WRITE_ONCE() to avoid store-tearing.

    sk_wmem_queued_add() helper is added so that we can in
    the future convert to ADD_ONCE() or equivalent if/when
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • For the sake of tcp_poll(), there are few places where we fetch
    sk->sk_rcvbuf while this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make sure write
    sides use corresponding WRITE_ONCE() to avoid store-tearing.

    Note that other transports probably need similar fixes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Oct, 2019

3 commits

  • rxrpc_put_call() calls trace_rxrpc_call() after it has done the decrement
    of the refcount - which looks at the debug_id in the call record. But
    unless the refcount was reduced to zero, we no longer have the right to
    look in the record and, indeed, it may be deleted by some other thread.

    Fix this by getting the debug_id out before decrementing the refcount and
    then passing that into the tracepoint.

    Fixes: e34d4234b0b7 ("rxrpc: Trace rxrpc_call usage")
    Signed-off-by: David Howells

    David Howells
     
  • rxrpc_put_*conn() calls trace_rxrpc_conn() after they have done the
    decrement of the refcount - which looks at the debug_id in the connection
    record. But unless the refcount was reduced to zero, we no longer have the
    right to look in the record and, indeed, it may be deleted by some other
    thread.

    Fix this by getting the debug_id out before decrementing the refcount and
    then passing that into the tracepoint.

    Fixes: 363deeab6d0f ("rxrpc: Add connection tracepoint and client conn state tracepoint")
    Signed-off-by: David Howells

    David Howells
     
  • rxrpc_put_peer() calls trace_rxrpc_peer() after it has done the decrement
    of the refcount - which looks at the debug_id in the peer record. But
    unless the refcount was reduced to zero, we no longer have the right to
    look in the record and, indeed, it may be deleted by some other thread.

    Fix this by getting the debug_id out before decrementing the refcount and
    then passing that into the tracepoint.

    This can cause the following symptoms:

    BUG: KASAN: use-after-free in __rxrpc_put_peer net/rxrpc/peer_object.c:411
    [inline]
    BUG: KASAN: use-after-free in rxrpc_put_peer+0x685/0x6a0
    net/rxrpc/peer_object.c:435
    Read of size 8 at addr ffff888097ec0058 by task syz-executor823/24216

    Fixes: 1159d4b496f5 ("rxrpc: Add a tracepoint to track rxrpc_peer refcounting")
    Reported-by: syzbot+b9be979c55f2bea8ed30@syzkaller.appspotmail.com
    Signed-off-by: David Howells

    David Howells
     

05 Oct, 2019

2 commits

  • Pull networking fixes from David Miller:

    1) Fix ieeeu02154 atusb driver use-after-free, from Johan Hovold.

    2) Need to validate TCA_CBQ_WRROPT netlink attributes, from Eric
    Dumazet.

    3) txq null deref in mac80211, from Miaoqing Pan.

    4) ionic driver needs to select NET_DEVLINK, from Arnd Bergmann.

    5) Need to disable bh during nft_connlimit GC, from Pablo Neira Ayuso.

    6) Avoid division by zero in taprio scheduler, from Vladimir Oltean.

    7) Various xgmac fixes in stmmac driver from Jose Abreu.

    8) Avoid 64-bit division in mlx5 leading to link errors on 32-bit from
    Michal Kubecek.

    9) Fix bad VLAN check in rtl8366 DSA driver, from Linus Walleij.

    10) Fix sleep while atomic in sja1105, from Vladimir Oltean.

    11) Suspend/resume deadlock in stmmac, from Thierry Reding.

    12) Various UDP GSO fixes from Josh Hunt.

    13) Fix slab out of bounds access in tcp_zerocopy_receive(), from Eric
    Dumazet.

    14) Fix OOPS in __ipv6_ifa_notify(), from David Ahern.

    15) Memory leak in NFC's llcp_sock_bind, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
    selftests/net: add nettest to .gitignore
    net: qlogic: Fix memory leak in ql_alloc_large_buffers
    nfc: fix memory leak in llcp_sock_bind()
    sch_dsmark: fix potential NULL deref in dsmark_init()
    net: phy: at803x: use operating parameters from PHY-specific status
    net: phy: extract pause mode
    net: phy: extract link partner advertisement reading
    net: phy: fix write to mii-ctrl1000 register
    ipv6: Handle missing host route in __ipv6_ifa_notify
    net: phy: allow for reset line to be tied to a sleepy GPIO controller
    net: ipv4: avoid mixed n_redirects and rate_tokens usage
    r8152: Set macpassthru in reset_resume callback
    cxgb4:Fix out-of-bounds MSI-X info array access
    Revert "ipv6: Handle race in addrconf_dad_work"
    net: make sock_prot_memory_pressure() return "const char *"
    rxrpc: Fix rxrpc_recvmsg tracepoint
    qmi_wwan: add support for Cinterion CLS8 devices
    tcp: fix slab-out-of-bounds in tcp_zerocopy_receive()
    lib: textsearch: fix escapes in example code
    udp: only do GSO if # of segs > 1
    ...

    Linus Torvalds
     
  • Fix the rxrpc_recvmsg tracepoint to handle being called with a NULL call
    parameter.

    Fixes: a25e21f0bcd2 ("rxrpc, afs: Use debug_ids rather than pointers in traces")
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

01 Oct, 2019

1 commit

  • Pull tracing fixes from Steven Rostedt:
    "A few more tracing fixes:

    - Fix a buffer overflow by checking nr_args correctly in probes

    - Fix a warning that is reported by clang

    - Fix a possible memory leak in error path of filter processing

    - Fix the selftest that checks for failures, but wasn't failing

    - Minor clean up on call site output of a memory trace event"

    * tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    selftests/ftrace: Fix same probe error test
    mm, tracing: Print symbol name for call_site in trace events
    tracing: Have error path in predicate_parse() free its allocated memory
    tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macro
    tracing/probe: Fix to check the difference of nr_args before adding probe

    Linus Torvalds
     

29 Sep, 2019

1 commit

  • To improve the readability of raw slab trace points, print the call_site ip
    using '%pS'. Then we can grep events with function names.

    [002] .... 808.188897: kmem_cache_free: call_site=putname+0x47/0x50 ptr=00000000cef40c80
    [002] .... 808.188898: kfree: call_site=security_cred_free+0x42/0x50 ptr=0000000062400820
    [002] .... 808.188904: kmem_cache_free: call_site=put_cred_rcu+0x88/0xa0 ptr=0000000058d74ef8
    [002] .... 808.188913: kmem_cache_alloc: call_site=prepare_creds+0x26/0x100 ptr=0000000058d74ef8 bytes_req=168 bytes_alloc=576 gfp_flags=GFP_KERNEL
    [002] .... 808.188917: kmalloc: call_site=security_prepare_creds+0x77/0xa0 ptr=0000000062400820 bytes_req=8 bytes_alloc=336 gfp_flags=GFP_KERNEL|__GFP_ZERO
    [002] .... 808.188920: kmem_cache_alloc: call_site=getname_flags+0x4f/0x1e0 ptr=00000000cef40c80 bytes_req=4096 bytes_alloc=4480 gfp_flags=GFP_KERNEL
    [002] .... 808.188925: kmem_cache_free: call_site=putname+0x47/0x50 ptr=00000000cef40c80
    [002] .... 808.188926: kfree: call_site=security_cred_free+0x42/0x50 ptr=0000000062400820
    [002] .... 808.188931: kmem_cache_free: call_site=put_cred_rcu+0x88/0xa0 ptr=0000000058d74ef8

    Link: http://lkml.kernel.org/r/20190914103215.23301-1-changbin.du@gmail.com

    Signed-off-by: Changbin Du
    Signed-off-by: Steven Rostedt (VMware)

    Changbin Du
     

27 Sep, 2019

1 commit

  • Pull NFS client updates from Anna Schumaker:
    "Stable bugfixes:
    - Dequeue the request from the receive queue while we're re-encoding
    # v4.20+
    - Fix buffer handling of GSS MIC without slack # 5.1

    Features:
    - Increase xprtrdma maximum transport header and slot table sizes
    - Add support for nfs4_call_sync() calls using a custom
    rpc_task_struct
    - Optimize the default readahead size
    - Enable pNFS filelayout LAYOUTGET on OPEN

    Other bugfixes and cleanups:
    - Fix possible null-pointer dereferences and memory leaks
    - Various NFS over RDMA cleanups
    - Various NFS over RDMA comment updates
    - Don't receive TCP data into a reset request buffer
    - Don't try to parse incomplete RPC messages
    - Fix congestion window race with disconnect
    - Clean up pNFS return-on-close error handling
    - Fixes for NFS4ERR_OLD_STATEID handling"

    * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
    pNFS/filelayout: enable LAYOUTGET on OPEN
    NFS: Optimise the default readahead size
    NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
    NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
    NFSv4: Fix OPEN_DOWNGRADE error handling
    pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
    NFSv4: Add a helper to increment stateid seqids
    NFSv4: Handle RPC level errors in LAYOUTRETURN
    NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
    NFSv4: Clean up pNFS return-on-close error handling
    pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
    NFS: remove unused check for negative dentry
    NFSv3: use nfs_add_or_obtain() to create and reference inodes
    NFS: Refactor nfs_instantiate() for dentry referencing callers
    SUNRPC: Fix congestion window race with disconnect
    SUNRPC: Don't try to parse incomplete RPC messages
    SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
    SUNRPC: Fix buffer handling of GSS MIC without slack
    SUNRPC: RPC level errors should always set task->tk_rpc_status
    SUNRPC: Don't receive TCP data into a request buffer that has been reset
    ...

    Linus Torvalds
     

26 Sep, 2019

1 commit

  • There are many of those warnings.

    In file included from ./arch/powerpc/include/asm/paca.h:15,
    from ./arch/powerpc/include/asm/current.h:13,
    from ./include/linux/thread_info.h:21,
    from ./include/asm-generic/preempt.h:5,
    from ./arch/powerpc/include/generated/asm/preempt.h:1,
    from ./include/linux/preempt.h:78,
    from ./include/linux/spinlock.h:51,
    from fs/fs-writeback.c:19:
    In function 'strncpy',
    inlined from 'perf_trace_writeback_page_template' at
    ./include/trace/events/writeback.h:56:1:
    ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified
    bound 32 equals destination size [-Wstringop-truncation]
    return __builtin_strncpy(p, q, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Fix it by using the new strscpy_pad() which was introduced in "lib/string:
    Add strscpy_pad() function" and will always be NUL-terminated instead of
    strncpy(). Also, change strlcpy() to use strscpy_pad() in this file for
    consistency.

    Link: http://lkml.kernel.org/r/1564075099-27750-1-git-send-email-cai@lca.pw
    Fixes: 455b2864686d ("writeback: Initial tracing support")
    Fixes: 028c2dd184c0 ("writeback: Add tracing to balance_dirty_pages")
    Fixes: e84d0a4f8e39 ("writeback: trace event writeback_queue_io")
    Fixes: b48c104d2211 ("writeback: trace event bdi_dirty_ratelimit")
    Fixes: cc1676d917f3 ("writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()")
    Fixes: 9fb0a7da0c52 ("writeback: add more tracepoints")
    Signed-off-by: Qian Cai
    Reviewed-by: Jan Kara
    Cc: Tobin C. Harding
    Cc: Steven Rostedt (VMware)
    Cc: Ingo Molnar
    Cc: Tejun Heo
    Cc: Dave Chinner
    Cc: Fengguang Wu
    Cc: Jens Axboe
    Cc: Joe Perches
    Cc: Kees Cook
    Cc: Jann Horn
    Cc: Jonathan Corbet
    Cc: Nitin Gote
    Cc: Rasmus Villemoes
    Cc: Stephen Kitt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

19 Sep, 2019

4 commits

  • Pull btrfs updates from David Sterba:
    "This continues with work on code refactoring, sanity checks and space
    handling. There are some less user visible changes, nothing that would
    particularly stand out.

    User visible changes:
    - tree checker, more sanity checks of:
    - ROOT_ITEM (key, size, generation, level, alignment, flags)
    - EXTENT_ITEM and METADATA_ITEM checks (key, size, offset,
    alignment, refs)
    - tree block reference items
    - EXTENT_DATA_REF (key, hash, offset)

    - deprecate flag BTRFS_SUBVOL_CREATE_ASYNC for subvolume creation
    ioctl, scheduled removal in 5.7

    - delete stale and unused UAPI definitions
    BTRFS_DEV_REPLACE_ITEM_STATE_*

    - improved export of debugging information available via existing
    sysfs directory structure

    - try harder to delete relations between qgroups and allow to delete
    orphan entries

    - remove unreliable space checks before relocation starts

    Core:
    - space handling:
    - improved ticket reservations and other high level logic in
    order to remove special cases
    - factor flushing infrastructure and use it for different
    contexts, allows to remove some special case handling
    - reduce metadata reservation when only updating inodes
    - reduce global block reserve minimum size (affects small
    filesystems)
    - improved overcommit logic wrt global block reserve

    - tests:
    - fix memory leaks in extent IO tree
    - catch all TRIM range

    Fixes:
    - fix ENOSPC errors, leading to transaction aborts, when cloning
    extents

    - several fixes for inode number cache (mount option inode_cache)

    - fix potential soft lockups during send when traversing large trees

    - fix unaligned access to space cache pages with SLUB debug on
    (PowerPC)

    Other:
    - refactoring public/private functions, moving to new or more
    appropriate files

    - defines converted to enums

    - error handling improvements

    - more assertions and comments

    - old code deletion"

    * tag 'for-5.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (138 commits)
    btrfs: Relinquish CPUs in btrfs_compare_trees
    btrfs: Don't assign retval of btrfs_try_tree_write_lock/btrfs_tree_read_lock_atomic
    btrfs: create structure to encode checksum type and length
    btrfs: turn checksum type define into an enum
    btrfs: add enospc debug messages for ticket failure
    btrfs: do not account global reserve in can_overcommit
    btrfs: use btrfs_try_granting_tickets in update_global_rsv
    btrfs: always reserve our entire size for the global reserve
    btrfs: change the minimum global reserve size
    btrfs: rename btrfs_space_info_add_old_bytes
    btrfs: remove orig_bytes from reserve_ticket
    btrfs: fix may_commit_transaction to deal with no partial filling
    btrfs: rework wake_all_tickets
    btrfs: refactor the ticket wakeup code
    btrfs: stop partially refilling tickets when releasing space
    btrfs: add space reservation tracepoint for reserved bytes
    btrfs: roll tracepoint into btrfs_space_info_update helper
    btrfs: do not allow reservations if we have pending tickets
    btrfs: stop clearing EXTENT_DIRTY in inode I/O tree
    btrfs: treat RWF_{,D}SYNC writes as sync for CRCs
    ...

    Linus Torvalds
     
  • Pull file locking updates from Jeff Layton:
    "Just a couple of minor bugfixes, a revision to a tracepoint to account
    for some earlier changes to the internals, and a patch to add a
    pr_warn message when someone tries to mount a filesystem with '-o
    mand' on a kernel that has that support disabled"

    * tag 'filelock-v5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    locks: fix a memory leak bug in __break_lease()
    locks: print a warning when mount fails due to lack of "mand" support
    locks: Fix procfs output for file leases
    locks: revise generic_add_lease tracepoint

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) Support IPV6 RA Captive Portal Identifier, from Maciej Żenczykowski.

    2) Use bio_vec in the networking instead of custom skb_frag_t, from
    Matthew Wilcox.

    3) Make use of xmit_more in r8169 driver, from Heiner Kallweit.

    4) Add devmap_hash to xdp, from Toke Høiland-Jørgensen.

    5) Support all variants of 5750X bnxt_en chips, from Michael Chan.

    6) More RTNL avoidance work in the core and mlx5 driver, from Vlad
    Buslov.

    7) Add TCP syn cookies bpf helper, from Petar Penkov.

    8) Add 'nettest' to selftests and use it, from David Ahern.

    9) Add extack support to drop_monitor, add packet alert mode and
    support for HW drops, from Ido Schimmel.

    10) Add VLAN offload to stmmac, from Jose Abreu.

    11) Lots of devm_platform_ioremap_resource() conversions, from
    YueHaibing.

    12) Add IONIC driver, from Shannon Nelson.

    13) Several kTLS cleanups, from Jakub Kicinski.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1930 commits)
    mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
    mlxsw: spectrum: Register CPU port with devlink
    mlxsw: spectrum_buffers: Prevent changing CPU port's configuration
    net: ena: fix incorrect update of intr_delay_resolution
    net: ena: fix retrieval of nonadaptive interrupt moderation intervals
    net: ena: fix update of interrupt moderation register
    net: ena: remove all old adaptive rx interrupt moderation code from ena_com
    net: ena: remove ena_restore_ethtool_params() and relevant fields
    net: ena: remove old adaptive interrupt moderation code from ena_netdev
    net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*()
    net: ena: enable the interrupt_moderation in driver_supported_features
    net: ena: reimplement set/get_coalesce()
    net: ena: switch to dim algorithm for rx adaptive interrupt moderation
    net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
    net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
    ethtool: implement Energy Detect Powerdown support via phy-tunable
    xen-netfront: do not assume sk_buff_head list is empty in error handling
    s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb”
    net: ena: don't wake up tx queue when down
    drop_monitor: Better sanitize notified packets
    ...

    Linus Torvalds
     
  • Pull staging and IIO driver updates from Greg KH:
    "Here is the big staging/iio driver update for 5.4-rc1.

    Lots of churn here, with a few driver/filesystems moving out of
    staging finally:

    - erofs moved out of staging

    - greybus core code moved out of staging

    Along with that, a new filesytem has been added:

    - extfat

    to provide support for those devices requiring that filesystem (i.e.
    transfer devices to/from windows systems or printers)

    Other than that, there a number of new IIO drivers, and lots and lots
    and lots of staging driver cleanups and minor fixes as people continue
    to dig into those for easy changes.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'staging-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (453 commits)
    Staging: gasket: Use temporaries to reduce line length.
    Staging: octeon: Avoid several usecases of strcpy
    staging: vhciq_core: replace snprintf with scnprintf
    staging: wilc1000: avoid twice IRQ handler execution for each single interrupt
    staging: wilc1000: remove unused interrupt status handling code
    staging: fbtft: make several arrays static const, makes object smaller
    staging: rtl8188eu: make two arrays static const, makes object smaller
    staging: rtl8723bs: core: Remove Macro "IS_MAC_ADDRESS_BROADCAST"
    dt-bindings: anybus-controller: move to staging/ tree
    staging: emxx_udc: remove local TRUE/FALSE definition
    staging: wilc1000: look for rtc_clk clock
    staging: dt-bindings: wilc1000: add optional rtc_clk property
    staging: nvec: make use of devm_platform_ioremap_resource
    staging: exfat: drop unused function parameter
    Staging: exfat: Avoid use of strcpy
    staging: exfat: use integer constants
    staging: exfat: cleanup spacing for casts
    staging: exfat: cleanup spacing for operators
    staging: rtl8723bs: hal: remove redundant variable n
    staging: pi433: Fix typo in documentation
    ...

    Linus Torvalds
     

18 Sep, 2019

2 commits

  • Pull power management updates from Rafael Wysocki:
    "These include a rework of the main suspend-to-idle code flow (related
    to the handling of spurious wakeups), a switch over of several users
    of cpufreq notifiers to QoS-based limits, a new devfreq driver for
    Tegra20, a new cpuidle driver and governor for virtualized guests, an
    extension of the wakeup sources framework to expose wakeup sources as
    device objects in sysfs, and more.

    Specifics:

    - Rework the main suspend-to-idle control flow to avoid repeating
    "noirq" device resume and suspend operations in case of spurious
    wakeups from the ACPI EC and decouple the ACPI EC wakeups support
    from the LPS0 _DSM support (Rafael Wysocki).

    - Extend the wakeup sources framework to expose wakeup sources as
    device objects in sysfs (Tri Vo, Stephen Boyd).

    - Expose system suspend statistics in sysfs (Kalesh Singh).

    - Introduce a new haltpoll cpuidle driver and a new matching governor
    for virtualized guests wanting to do guest-side polling in the idle
    loop (Marcelo Tosatti, Joao Martins, Wanpeng Li, Stephen Rothwell).

    - Fix the menu and teo cpuidle governors to allow the scheduler tick
    to be stopped if PM QoS is used to limit the CPU idle state exit
    latency in some cases (Rafael Wysocki).

    - Increase the resolution of the play_idle() argument to microseconds
    for more fine-grained injection of CPU idle cycles (Daniel
    Lezcano).

    - Switch over some users of cpuidle notifiers to the new QoS-based
    frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
    policy notifier events (Viresh Kumar).

    - Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).

    - Add support for MT8183 and MT8516 to the mediatek cpufreq driver
    (Andrew-sh.Cheng, Fabien Parent).

    - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
    Huang).

    - Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).

    - Update the qcom cpufreq driver (among other things, to make it
    easier to extend and to use kryo cpufreq for other nvmem-based
    SoCs) and add qcs404 support to it (Niklas Cassel, Douglas
    RAILLARD, Sibi Sankar, Sricharan R).

    - Fix assorted issues and make assorted minor improvements in the
    cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
    Gustavo Silva, Hariprasad Kelam).

    - Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
    Bergmann).

    - Add new Exynos PPMU events to devfreq events and extend that
    mechanism (Lukasz Luba).

    - Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).

    - Improve devfreq documentation and governor code, fix spelling typos
    in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard Crestez,
    MyungJoo Ham, Gaël PORTAY).

    - Add regulators enable and disable to the OPP (operating performance
    points) framework (Kamil Konieczny).

    - Update the OPP framework to support multiple opp-suspend properties
    (Anson Huang).

    - Fix assorted issues and make assorted minor improvements in the OPP
    code (Niklas Cassel, Viresh Kumar, Yue Hu).

    - Clean up the generic power domains (genpd) framework (Ulf Hansson).

    - Clean up assorted pieces of power management code and documentation
    (Akinobu Mita, Amit Kucheria, Chuhong Yuan).

    - Update the pm-graph tool to version 5.5 including multiple fixes
    and improvements (Todd Brandt).

    - Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
    Sébastien Szymanski)"

    * tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (126 commits)
    cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available
    cpuidle-haltpoll: do not set an owner to allow modunload
    cpuidle-haltpoll: return -ENODEV on modinit failure
    cpuidle-haltpoll: set haltpoll as preferred governor
    cpuidle: allow governor switch on cpuidle_register_driver()
    PM: runtime: Documentation: add runtime_status ABI document
    pm-graph: make setVal unbuffered again for python2 and python3
    powercap: idle_inject: Use higher resolution for idle injection
    cpuidle: play_idle: Increase the resolution to usec
    cpuidle-haltpoll: vcpu hotplug support
    cpufreq: Add qcs404 to cpufreq-dt-platdev blacklist
    cpufreq: qcom: Add support for qcs404 on nvmem driver
    cpufreq: qcom: Refactor the driver to make it easier to extend
    cpufreq: qcom: Re-organise kryo cpufreq to use it for other nvmem based qcom socs
    dt-bindings: opp: Add qcom-opp bindings with properties needed for CPR
    dt-bindings: opp: qcom-nvmem: Support pstates provided by a power domain
    Documentation: cpufreq: Update policy notifier documentation
    cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events
    PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state()
    PM / Domains: Simplify genpd_lookup_dev()
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:

    - Two NVMe pull requests:
    - ana log parse fix from Anton
    - nvme quirks support for Apple devices from Ben
    - fix missing bio completion tracing for multipath stack devices
    from Hannes and Mikhail
    - IP TOS settings for nvme rdma and tcp transports from Israel
    - rq_dma_dir cleanups from Israel
    - tracing for Get LBA Status command from Minwoo
    - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
    - Some consolidation between the fabrics transports for handling
    the CAP register
    - reset race with ns scanning fix for fabrics (move fabrics
    commands to a dedicated request queue with a different lifetime
    from the admin request queue)."
    - controller reset and namespace scan races fixes
    - nvme discovery log change uevent support
    - naming improvements from Keith
    - multiple discovery controllers reject fix from James
    - some regular cleanups from various people

    - Series fixing (and re-fixing) null_blk debug printing and nr_devices
    checks (André)

    - A few pull requests from Song, with fixes from Andy, Guoqing,
    Guilherme, Neil, Nigel, and Yufen.

    - REQ_OP_ZONE_RESET_ALL support (Chaitanya)

    - Bio merge handling unification (Christoph)

    - Pick default elevator correctly for devices with special needs
    (Damien)

    - Block stats fixes (Hou)

    - Timeout and support devices nbd fixes (Mike)

    - Series fixing races around elevator switching and device add/remove
    (Ming)

    - sed-opal cleanups (Revanth)

    - Per device weight support for BFQ (Fam)

    - Support for blk-iocost, a new model that can properly account cost of
    IO workloads. (Tejun)

    - blk-cgroup writeback fixes (Tejun)

    - paride queue init fixes (zhengbin)

    - blk_set_runtime_active() cleanup (Stanley)

    - Block segment mapping optimizations (Bart)

    - lightnvm fixes (Hans/Minwoo/YueHaibing)

    - Various little fixes and cleanups

    * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
    null_blk: format pr_* logs with pr_fmt
    null_blk: match the type of parameter nr_devices
    null_blk: do not fail the module load with zero devices
    block: also check RQF_STATS in blk_mq_need_time_stamp()
    block: make rq sector size accessible for block stats
    bfq: Fix bfq linkage error
    raid5: use bio_end_sector in r5_next_bio
    raid5: remove STRIPE_OPS_REQ_PENDING
    md: add feature flag MD_FEATURE_RAID0_LAYOUT
    md/raid0: avoid RAID0 data corruption due to layout confusion.
    raid5: don't set STRIPE_HANDLE to stripe which is in batch list
    raid5: don't increment read_errors on EILSEQ return
    nvmet: fix a wrong error status returned in error log page
    nvme: send discovery log page change events to userspace
    nvme: add uevent variables for controller devices
    nvme: enable aen regardless of the presence of I/O queues
    nvme-fabrics: allow discovery subsystems accept a kato
    nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
    nvme: Remove redundant assignment of cq vector
    nvme: Assign subsys instance from first ctrl
    ...

    Linus Torvalds
     

17 Sep, 2019

2 commits

  • * pm-opp:
    PM / OPP: Correct Documentation about library location
    opp: of: Support multiple suspend OPPs defined in DT
    dt-bindings: opp: Support multiple opp-suspend properties
    opp: core: add regulators enable and disable
    opp: Don't decrement uninitialized list_kref

    * pm-qos:
    PM: QoS: Get rid of unused flags

    * acpi-pm:
    ACPI: PM: Print debug messages on device power state changes

    * pm-domains:
    PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state()
    PM / Domains: Simplify genpd_lookup_dev()
    PM / Domains: Align in-parameter names for some genpd functions

    * pm-tools:
    pm-graph: make setVal unbuffered again for python2 and python3
    cpupower: update German translation
    tools/power/cpupower: fix 64bit detection when cross-compiling
    cpupower: Add missing newline at end of file
    pm-graph v5.5

    Rafael J. Wysocki
     
  • Pull RCU updates from Ingo Molnar:
    "This cycle's RCU changes were:

    - A few more RCU flavor consolidation cleanups.

    - Updates to RCU's list-traversal macros improving lockdep usability.

    - Forward-progress improvements for no-CBs CPUs: Avoid ignoring
    incoming callbacks during grace-period waits.

    - Forward-progress improvements for no-CBs CPUs: Use ->cblist
    structure to take advantage of others' grace periods.

    - Also added a small commit that avoids needlessly inflicting
    scheduler-clock ticks on callback-offloaded CPUs.

    - Forward-progress improvements for no-CBs CPUs: Reduce contention on
    ->nocb_lock guarding ->cblist.

    - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass
    list to further reduce contention on ->nocb_lock guarding ->cblist.

    - Miscellaneous fixes.

    - Torture-test updates.

    - minor LKMM updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits)
    MAINTAINERS: Update from paulmck@linux.ibm.com to paulmck@kernel.org
    rcu: Don't include in rcutiny.h
    rcu: Allow rcu_do_batch() to dynamically adjust batch sizes
    rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload
    rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention
    rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention
    rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks()
    rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake()
    rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed
    rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended
    rcu/nocb: Add bypass callback queueing
    rcu/nocb: Atomic ->len field in rcu_segcblist structure
    rcu/nocb: Unconditionally advance and wake for excessive CBs
    rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock
    rcu/nocb: Reduce contention at no-CBs invocation-done time
    rcu/nocb: Reduce contention at no-CBs registry-time CB advancement
    rcu/nocb: Round down for number of no-CBs grace-period kthreads
    rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU
    rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread
    rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks
    ...

    Linus Torvalds