02 Dec, 2017

4 commits

  • The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
    to find the rds_connection entries marked for deletion as part
    of the netns deletion under the protection of the rds_tcp_conn_lock.
    Since the rds_tcp_conn_list tracks rds_tcp_connections (which
    have a 1:1 mapping with rds_conn_path), multiple tc entries in
    the rds_tcp_conn_list will map to a single rds_connection, and will
    be deleted as part of the rds_conn_destroy() operation that is
    done outside the rds_tcp_conn_lock.

    The rds_tcp_conn_list traversal done under the protection of
    rds_tcp_conn_lock should not leave any doomed tc entries in
    the list after the rds_tcp_conn_lock is released, else another
    concurrently executiong netns delete (for a differnt netns) thread
    may trip on these entries.

    Reported-by: syzbot
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
    introduces a regression in rds-tcp netns cleanup. The cleanup_net(),
    (and thus rds_tcp_dev_event notification) is only called from put_net()
    when all netns refcounts go to 0, but this cannot happen if the
    rds_connection itself is holding a c_net ref that it expects to
    release in rds_tcp_kill_sock.

    Instead, the rds_tcp_kill_sock callback should make sure to
    tear down state carefully, ensuring that the socket teardown
    is only done after all data-structures and workqs that depend
    on it are quiesced.

    The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit
    refcounts on struct net") was to resolve a race condition reported by
    syzkaller where workqs for tx/rx/connect were triggered after the
    namespace was deleted. Those worker threads should have been
    cancelled/flushed before socket tear-down and indeed,
    rds_conn_path_destroy() does try to sequence this by doing
    /* cancel cp_send_w */
    /* cancel cp_recv_w */
    /* flush cp_down_w */
    /* free data structures */
    Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus
    invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that
    we ought to have satisfied the requirement that "socket-close is
    done after all other dependent state is quiesced". However,
    rds_conn_shutdown has a bug in that it *always* triggers the reconnect
    workq (and if connection is successful, we always restart tx/rx
    workqs so with the right timing, we risk the race conditions reported
    by syzkaller).

    Netns deletion is like module teardown- no need to restart a
    reconnect in this case. We can use the c_destroy_in_prog bit
    to avoid restarting the reconnect.

    Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • A side-effect of Commit c14b0366813a ("rds: tcp: set linger to 1
    when unloading a rds-tcp") is that we always send a RST on the tcp
    connection for rds_conn_destroy(), so rds_tcp_conn_paths_destroy()
    is not needed any more and is removed in this patch.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • When sending node local messages the code is using an 'mtu' of 66060
    bytes to avoid unnecessary fragmentation. During situations of low
    memory tipc_msg_build() may sometimes fail to allocate such large
    buffers, resulting in unnecessary send failures. This can easily be
    remedied by falling back to a smaller MTU, and then reassemble the
    buffer chain as if the message were arriving from a remote node.

    At the same time, we change the initial MTU setting of the broadcast
    link to a lower value, so that large messages always are fragmented
    into smaller buffers even when we run in single node mode. Apart from
    obtaining the same advantage as for the 'fallback' solution above, this
    turns out to give a significant performance improvement. This can
    probably be explained with the __pskb_copy() operation performed on the
    buffer for each recipient during reception. We found the optimal value
    for this, considering the most relevant skb pool, to be 3744 bytes.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

01 Dec, 2017

6 commits

  • Rafal Ozieblo says:

    ====================
    Receive packets filtering for macb driver

    This patch series adds support for receive packets
    filtering for Cadence GEM driver. Packets can be redirect
    to different hardware queues based on source IP, destination IP,
    source port or destination port. To enable filtering,
    support for RX queueing was added as well.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch allows filtering received packets to different
    hardware queues (aka ntuple).

    Signed-off-by: Rafal Ozieblo
    Signed-off-by: David S. Miller

    Rafal Ozieblo
     
  • Added statistics per queue:
    - qX_rx_packets
    - qX_rx_bytes
    - qX_rx_dropped
    - qX_tx_packets
    - qX_tx_bytes
    - qX_tx_dropped

    Signed-off-by: Rafal Ozieblo
    Signed-off-by: David S. Miller

    Rafal Ozieblo
     
  • To be able for packet reception on different RX queues some
    configuration has to be performed. This patch checks how many
    hardware queue does GEM support and initializes them.

    Signed-off-by: Rafal Ozieblo
    Signed-off-by: David S. Miller

    Rafal Ozieblo
     
  • There are several reasons for increasing the receive ring sizes:

    1. The original ring size of 256 was chosen about 10 years ago when
    vmxnet3 was first created. At that time, 10Gbps Ethernet was not prevalent
    and servers were dominated by 1Gbps Ethernet. Now 10Gbps is common place,
    and higher bandwidth links -- 25Gbps, 40Gbps, 50Gbps -- are starting
    to appear. 256 Rx ring entries are simply not enough to keep up with
    higher link speed when there is a burst of network frames coming from
    these high speed links. Even with full MTU size frames, they are gone
    in a short time. It is also more common to have a mix of frame sizes,
    and more likely bi-modal distribution of frame sizes so the average frame
    size is not close to full MTU. If we consider average frame size of 800B,
    1024 frames that come in a burst takes ~0.65 ms to arrive at 10Gbps. With
    256 entires, it takes ~0.16 ms to arrive at 10Gbps. At 25Gbps or 40Gbps,
    this time is reduced accordingly.

    2. On a hypervisor where there are many VMs and CPU is over committed,
    i.e. the number of VCPUs is more than the number of VCPUs, each PCPU is
    in effect time shared between multiple VMs/VCPUs. The time granularity at
    which this multiplexing occurs is typically coarser than between processes
    on a guest OS. Trying to time slice more finely is not efficient, for
    example, if memory cache is barely warmed up when switching from one VM
    to another occurs. This CPU overcommit adds delay to when the driver
    in a VM can service incoming packets. Whether CPU is over committed
    really depends on customer workloads. For certain situations, it is very
    common. For example, workloads of desktop VMs and product testing setups.
    Consolidation and sharing is what drives efficiency of a customer setup
    for such workloads. In these situations, the raw network bandwidth may
    not be very high, but the delays between when a VM is running or not
    running can also be relatively long.

    Signed-off-by: Shrikrishna Khare
    Acked-by: Jin Heo
    Acked-by: Guolin Yang
    Acked-by: Boon Ang
    Signed-off-by: David S. Miller

    Shrikrishna Khare
     
  • Utilize the much more capable b53_get_tag_protocol() which takes care of
    all Broadcom switches specifics to resolve which port can have Broadcom
    tags enabled or not.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     

30 Nov, 2017

30 commits

  • Since commit e32ea7e74727 ("soreuseport: fast reuseport UDP socket
    selection") and commit c125e80b8868 ("soreuseport: fast reuseport
    TCP socket selection") the relevant reuseport socket matching the current
    packet is selected by the reuseport_select_sock() call. The only
    exceptions are invalid BPF filters/filters returning out-of-range
    indices.
    In the latter case the code implicitly falls back to using the hash
    demultiplexing, but instead of selecting the socket inside the
    reuseport_select_sock() function, it relies on the hash selection
    logic introduced with the early soreuseport implementation.

    With this patch, in case of a BPF filter returning a bad socket
    index value, we fall back to hash-based selection inside the
    reuseport_select_sock() body, so that we can drop some duplicate
    code in the ipv4 and ipv6 stack.

    This also allows faster lookup in the above scenario and will allow
    us to avoid computing the hash value for successful, BPF based
    demultiplexing - in a later patch.

    Signed-off-by: Paolo Abeni
    Acked-by: Craig Gallek
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • This is not supported anymore, devices needing a MAC address
    just assign one at random, it's just a driver pecularity.

    Signed-off-by: Linus Walleij
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Linus Walleij
     
  • David Miller says:

    ====================
    net: Significantly shrink the size of routes.

    Through a combination of several things, our route structures are
    larger than they need to be.

    Mostly this stems from having members in dst_entry which are only used
    by one class of routes. So the majority of the work in this series is
    about "un-commoning" these members and pushing them into the type
    specific structures.

    Unfortunately, IPSEC needed the most surgery. The majority of the
    changes here had to do with bundle creation and management.

    The other issue is the refcount alignment in dst_entry. Once we get
    rid of the not-so-common members, it really opens the door to removing
    that alignment entirely.

    I think the new layout looks really nice, so I'll reproduce it here:

    struct net_device *dev;
    struct dst_ops *ops;
    unsigned long _metrics;
    unsigned long expires;
    struct xfrm_state *xfrm;
    int (*input)(struct sk_buff *);
    int (*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
    unsigned short flags;
    short obsolete;
    unsigned short header_len;
    unsigned short trailer_len;
    atomic_t __refcnt;
    int __use;
    unsigned long lastuse;
    struct lwtunnel_state *lwtstate;
    struct rcu_head rcu_head;
    short error;
    short __pad;
    __u32 tclassid;

    (This is for 64-bit, on 32-bit the __refcnt comes at the very end)

    So, the good news:

    1) struct dst_entry shrinks from 160 to 112 bytes.

    2) struct rtable shrinks from 216 to 168 bytes.

    3) struct rt6_info shrinks from 384 to 320 bytes.

    Enjoy.

    v2:
    Collapse some patches logically based upon feedback.
    Fix the strange patch #7.

    v3: xfrm_dst_path() needs inline keyword
    Properly align __refcnt on 32-bit.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • There are no more users.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • While building ipsec bundles, blocks of xfrm dsts are linked together
    using dst->next from bottom to the top.

    The only thing this is used for is initializing the pmtu values of the
    xfrm stack, and for updating the mtu values at xfrm_bundle_ok() time.

    The bundle pmtu entries must be processed in this order so that pmtu
    values lower in the stack of routes can propagate up to the higher
    ones.

    Avoid using dst->next by simply maintaining an array of dst pointers
    as we already do for the xfrm_state objects when building the bundle.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • We have padding to try and align the refcount on a separate cache
    line. But after several simplifications the padding has increased
    substantially.

    So now it's easy to change the layout to get rid of the padding
    entirely.

    We group the write-heavy __refcnt and __use with less often used
    items such as the rcu_head and the error code.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • The first member of an IPSEC route bundle chain sets it's dst->path to
    the underlying ipv4/ipv6 route that carries the bundle.

    Stated another way, if one were to follow the xfrm_dst->child chain of
    the bundle, the final non-NULL pointer would be the path and point to
    either an ipv4 or an ipv6 route.

    This is largely used to make sure that PMTU events propagate down to
    the correct ipv4 or ipv6 route.

    When we don't have the top of an IPSEC bundle 'dst->path == dst'.

    Move it down into xfrm_dst and key off of dst->xfrm.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • The dst->from value is only used by ipv6 routes to track where
    a route "came from".

    Any time we clone or copy a core ipv6 route in the ipv6 routing
    tables, we have the copy/clone's ->from point to the base route.

    This is used to handle route expiration properly.

    Only ipv6 uses this mechanism, and only ipv6 code references
    it. So it is safe to move it into rt6_info.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • XFRM bundle child chains look like this:

    xdst1 --> xdst2 --> xdst3 --> path_dst

    All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
    The final child pointer in the chain, here called 'path_dst', is some
    other kind of route such as an ipv4 or ipv6 one.

    The xfrm output path pops routes, one at a time, via the child
    pointer, until we hit one which has a dst->xfrm pointer which
    is NULL.

    We can easily preserve the above mechanisms with child sitting
    only in the xfrm_dst structure. All children in the chain
    before we break out of the xfrm_output() loop have dst->xfrm
    non-NULL and are therefore xfrm_dst objects.

    Since we break out of the loop when we find dst->xfrm NULL, we
    will not try to dereference 'dst' as if it were an xfrm_dst.

    Signed-off-by: David S. Miller

    David Miller
     
  • This will make a future change moving the dst->child pointer less
    invasive.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • Only IPSEC routes have a non-NULL dst->child pointer. And IPSEC
    routes are identified by a non-NULL dst->xfrm pointer.

    Signed-off-by: David S. Miller

    David Miller
     
  • Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • Delete it.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • In xmit, it is very impossible that TX_ERROR occurs. So using
    unlikely optimizes the xmit process.

    CC: Srinivas Eeda
    CC: Joe Jin
    CC: Junxiao Bi
    Signed-off-by: Zhu Yanjun
    Signed-off-by: David S. Miller

    Zhu Yanjun
     
  • net/atm/mpoa_* files use 'struct timeval' to store event
    timestamps. struct timeval uses a 32-bit seconds field which will
    overflow in the year 2038 and beyond. Morever, the timestamps are being
    compared only to get seconds elapsed, so struct timeval which stores
    a seconds and microseconds field is an overkill. This patch replaces
    the use of struct timeval with time64_t to store a 64-bit seconds field.

    Signed-off-by: Tina Ruchandani
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Tina Ruchandani
     
  • There are several statements that have incorrect indentation. Fix
    these.

    Signed-off-by: Colin Ian King
    Signed-off-by: David S. Miller

    Colin Ian King
     
  • timespec is deprecated because of the y2038 overflow, so let's convert
    this one to ktime_get_ts64(). The code is already safe even on 32-bit
    architectures, since it uses monotonic times. On 64-bit architectures,
    nothing changes, while on 32-bit architectures this avoids one
    type conversion.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • netxen_collect_minidump() evidently just wants to get a monotonic
    timestamp. Using jiffies_to_timespec(jiffies, &ts) is not
    appropriate here, since it will overflow after 2^32 jiffies,
    which may be as short as 49 days of uptime.

    ktime_get_seconds() is the correct interface here.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • Previously phy_id was u32 and phy_id_mask was unsigned int. As the
    phy_id_mask defines the important bits of the phy_id (and is therefore
    the same size) these two variables should be the same data type.

    Signed-off-by: Richard Leitner
    Reviewed-by: Florian Fainelli
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Richard Leitner
     
  • No need to reinvent the wheel, we have bus_find_device_by_name().

    Cc: Grygorii Strashko
    Signed-off-by: Lukas Wunner
    Signed-off-by: David S. Miller

    Lukas Wunner
     
  • on T81 there are only 4 cores, hence setting max queue count to 4
    would leave nothing for XDP_TX. This patch fixes this by doubling
    max queue count in above scenarios.

    Signed-off-by: Sunil Goutham
    Signed-off-by: cjacob
    Signed-off-by: Aleksey Makarov
    Signed-off-by: David S. Miller

    Sunil Goutham
     
  • This patch adds support for XDP_REDIRECT. Flush is not
    yet supported.

    Signed-off-by: Sunil Goutham
    Signed-off-by: cjacob
    Signed-off-by: Aleksey Makarov
    Signed-off-by: David S. Miller

    Sunil Goutham
     
  • Pull nfsd fixes from Bruce Fields:
    "I screwed up my merge window pull request; I only sent half of what I
    meant to.

    There were no new features, just bugfixes of various importance and
    some very minor cleanup, so I think it's all still appropriate for
    -rc2.

    Highlights:

    - Fixes from Trond for some races in the NFSv4 state code.

    - Fix from Naofumi Honda for a typo in the blocked lock notificiation
    code

    - Fixes from Vasily Averin for some problems starting and stopping
    lockd especially in network namespaces"

    * tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits)
    lockd: fix "list_add double add" caused by legacy signal interface
    nlm_shutdown_hosts_net() cleanup
    race of nfsd inetaddr notifiers vs nn->nfsd_serv change
    race of lockd inetaddr notifiers vs nlmsvc_rqst change
    SUNRPC: make cache_detail structures const
    NFSD: make cache_detail structures const
    sunrpc: make the function arg as const
    nfsd: check for use of the closed special stateid
    nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
    lockd: lost rollback of set_grace_period() in lockd_down_net()
    lockd: added cleanup checks in exit_net hook
    grace: replace BUG_ON by WARN_ONCE in exit_net hook
    nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
    lockd: remove net pointer from messages
    nfsd: remove net pointer from debug messages
    nfsd: Fix races with check_stateid_generation()
    nfsd: Ensure we check stateid validity in the seqid operation checks
    nfsd: Fix race in lock stateid creation
    nfsd4: move find_lock_stateid
    nfsd: Ensure we don't recognise lock stateids after freeing them
    ...

    Linus Torvalds
     
  • Pull btrfs fixes from David Sterba:
    "We've collected some fixes in since the pre-merge window freeze.

    There's technically only one regression fix for 4.15, but the rest
    seems important and candidates for stable.

    - fix missing flush bio puts in error cases (is serious, but rarely
    happens)

    - fix reporting stat::st_blocks for buffered append writes

    - fix space cache invalidation

    - fix out of bound memory access when setting zlib level

    - fix potential memory corruption when fsync fails in the middle

    - fix crash in integrity checker

    - incremetnal send fix, path mixup for certain unlink/rename
    combination

    - pass flags to writeback so compressed writes can be throttled
    properly

    - error handling fixes"

    * tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    Btrfs: incremental send, fix wrong unlink path after renaming file
    btrfs: tree-checker: Fix false panic for sanity test
    Btrfs: fix list_add corruption and soft lockups in fsync
    btrfs: Fix wild memory access in compression level parser
    btrfs: fix deadlock when writing out space cache
    btrfs: clear space cache inode generation always
    Btrfs: fix reported number of inode blocks after buffered append writes
    Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
    Btrfs: bail out gracefully rather than BUG_ON
    btrfs: dev_alloc_list is not protected by RCU, use normal list_del
    btrfs: add missing device::flush_bio puts
    btrfs: Fix transaction abort during failure in btrfs_rm_dev_item
    Btrfs: add write_flags for compression bio

    Linus Torvalds
     
  • Pull Microblaze fix from Michal Simek:
    "Add missing header to mmu_context_mm.h"

    * tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze:
    microblaze: add missing include to mmu_context_mm.h

    Linus Torvalds
     
  • Pull sparc fix from David Miller:
    "Sparc T4 and later cpu bootup regression fix"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc64: Fix boot on T4 and later.

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) The forcedeth conversion from pci_*() DMA interfaces to dma_*() ones
    missed one spot. From Zhu Yanjun.

    2) Missing CRYPTO_SHA256 Kconfig dep in cfg80211, from Johannes Berg.

    3) Fix checksum offloading in thunderx driver, from Sunil Goutham.

    4) Add SPDX to vm_sockets_diag.h, from Stephen Hemminger.

    5) Fix use after free of packet headers in TIPC, from Jon Maloy.

    6) "sizeof(ptr)" vs "sizeof(*ptr)" bug in i40e, from Gustavo A R Silva.

    7) Tunneling fixes in mlxsw driver, from Petr Machata.

    8) Fix crash in fanout_demux_rollover() of AF_PACKET, from Mike
    Maloney.

    9) Fix race in AF_PACKET bind() vs. NETDEV_UP notifier, from Eric
    Dumazet.

    10) Fix regression in sch_sfq.c due to one of the timer_setup()
    conversions. From Paolo Abeni.

    11) SCTP does list_for_each_entry() using wrong struct member, fix from
    Xin Long.

    12) Don't use big endian netlink attribute read for
    IFLA_BOND_AD_ACTOR_SYSTEM, it is in cpu endianness. Also from Xin
    Long.

    13) Fix mis-initialization of q->link.clock in CBQ scheduler, preventing
    adding filters there. From Jiri Pirko.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
    ethernet: dwmac-stm32: Fix copyright
    net: via: via-rhine: use %p to format void * address instead of %x
    net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
    myri10ge: Update MAINTAINERS
    net: sched: cbq: create block for q->link.block
    atm: suni: remove extraneous space to fix indentation
    atm: lanai: use %p to format kernel addresses instead of %x
    VSOCK: Don't set sk_state to TCP_CLOSE before testing it
    atm: fore200e: use %pK to format kernel addresses instead of %x
    ambassador: fix incorrect indentation of assignment statement
    vxlan: use __be32 type for the param vni in __vxlan_fdb_delete
    bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM
    sctp: use right member as the param of list_for_each_entry
    sch_sfq: fix null pointer dereference at timer expiration
    cls_bpf: don't decrement net's refcount when offload fails
    net/packet: fix a race in packet_bind() and packet_notifier()
    packet: fix crash in fanout_demux_rollover()
    sctp: remove extern from stream sched
    sctp: force the params with right types for sctp csum apis
    sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
    ...

    Linus Torvalds
     
  • If we don't put the NG4fls.o object into the same part of
    the link as the generic sparc64 objects for fls() and __fls()
    then the relocation in the branch we use for patching will
    not fit.

    Move NG4fls.o into lib-y to fix this problem.

    Fixes: 46ad8d2d22c1 ("sparc64: Use sparc optimized fls and __fls for T4 and above")
    Signed-off-by: David S. Miller
    Reported-by: Anatoly Pugachev
    Tested-by: Anatoly Pugachev

    David S. Miller
     
  • Instead, just fall back on the new '%p' behavior which hashes the
    pointer.

    Otherwise, '%pK' - that was intended to mark a pointer as restricted -
    just ends up leaking pointers that a normal '%p' wouldn't leak. Which
    just make the whole thing pointless.

    I suspect we should actually get rid of '%pK' entirely, and make it just
    work as '%p' regardless, but this is the minimal obvious fix. People
    who actually use 'kptr_restrict' should weigh in on which behavior they
    want.

    Cc: Tobin Harding
    Cc: Kees Cook
    Signed-off-by: Linus Torvalds

    Linus Torvalds