06 Mar, 2019

1 commit

  • Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

    All these places for replacement were found by running the following
    grep patterns on the entire kernel code. Please let me know if this
    might have missed some instances. This might also have replaced some
    false positives. I will appreciate suggestions, inputs and review.

    1. git grep "nid == -1"
    2. git grep "node == -1"
    3. git grep "nid = -1"
    4. git grep "node = -1"

    This patch (of 2):

    At present there are multiple places where invalid node number is
    encoded as -1. Even though implicitly understood it is always better to
    have macros in there. Replace these open encodings for an invalid node
    number with the global macro NUMA_NO_NODE. This helps remove NUMA
    related assumptions like 'invalid node' from various places redirecting
    them to a common definition.

    Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Reviewed-by: David Hildenbrand
    Acked-by: Jeff Kirsher [ixgbe]
    Acked-by: Jens Axboe [mtip32xx]
    Acked-by: Vinod Koul [dmaengine.c]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Doug Ledford [drivers/infiniband]
    Cc: Joseph Qi
    Cc: Hans Verkuil
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

14 Sep, 2018

1 commit


28 Jul, 2018

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2018-07-27

    1) Extend the output_mark to also support the input direction
    and masking the mark values before applying to the skb.

    2) Add a new lookup key for the upcomming xfrm interfaces.

    3) Extend the xfrm lookups to match xfrm interface IDs.

    4) Add virtual xfrm interfaces. The purpose of these interfaces
    is to overcome the design limitations that the existing
    VTI devices have.

    The main limitations that we see with the current VTI are the
    following:

    VTI interfaces are L3 tunnels with configurable endpoints.
    For xfrm, the tunnel endpoint are already determined by the SA.
    So the VTI tunnel endpoints must be either the same as on the
    SA or wildcards. In case VTI tunnel endpoints are same as on
    the SA, we get a one to one correlation between the SA and
    the tunnel. So each SA needs its own tunnel interface.

    On the other hand, we can have only one VTI tunnel with
    wildcard src/dst tunnel endpoints in the system because the
    lookup is based on the tunnel endpoints. The existing tunnel
    lookup won't work with multiple tunnels with wildcard
    tunnel endpoints. Some usecases require more than on
    VTI tunnel of this type, for example if somebody has multiple
    namespaces and every namespace requires such a VTI.

    VTI needs separate interfaces for IPv4 and IPv6 tunnels.
    So when routing to a VTI, we have to know to which address
    family this traffic class is going to be encapsulated.
    This is a lmitation because it makes routing more complex
    and it is not always possible to know what happens behind the
    VTI, e.g. when the VTI is move to some namespace.

    VTI works just with tunnel mode SAs. We need generic interfaces
    that ensures transfomation, regardless of the xfrm mode and
    the encapsulated address family.

    VTI is configured with a combination GRE keys and xfrm marks.
    With this we have to deal with some extra cases in the generic
    tunnel lookup because the GRE keys on the VTI are actually
    not GRE keys, the GRE keys were just reused for something else.
    All extensions to the VTI interfaces would require to add
    even more complexity to the generic tunnel lookup.

    So to overcome this, we developed xfrm interfaces with the
    following design goal:

    It should be possible to tunnel IPv4 and IPv6 through the same
    interface.

    No limitation on xfrm mode (tunnel, transport and beet).

    Should be a generic virtual interface that ensures IPsec
    transformation, no need to know what happens behind the
    interface.

    Interfaces should be configured with a new key that must match a
    new policy/SA lookup key.

    The lookup logic should stay in the xfrm codebase, no need to
    change or extend generic routing and tunnel lookups.

    Should be possible to use IPsec hardware offloads of the underlying
    interface.

    5) Remove xfrm pcpu policy cache. This was added after the flowcache
    removal, but it turned out to make things even worse.
    From Florian Westphal.

    6) Allow to update the set mark on SA updates.
    From Nathan Harold.

    7) Convert some timestamps to time64_t.
    From Arnd Bergmann.

    8) Don't check the offload_handle in xfrm code,
    it is an opaque data cookie for the driver.
    From Shannon Nelson.

    9) Remove xfrmi interface ID from flowi. After this pach
    no generic code is touched anymore to do xfrm interface
    lookups. From Benedict Wong.

    10) Allow to update the xfrm interface ID on SA updates.
    From Nathan Harold.

    11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle.
    From YueHaibing.

    12) Return more detailed errors on xfrm interface creation.
    From Benedict Wong.

    13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR.
    From the kbuild test robot.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Jul, 2018

1 commit

  • GCC 8 complains:

    net/core/pktgen.c: In function ‘pktgen_if_write’:
    net/core/pktgen.c:1419:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation]
    strncpy(pkt_dev->src_max, buf, len);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    net/core/pktgen.c:1399:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation]
    strncpy(pkt_dev->src_min, buf, len);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    net/core/pktgen.c:1290:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation]
    strncpy(pkt_dev->dst_max, buf, len);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    net/core/pktgen.c:1268:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation]
    strncpy(pkt_dev->dst_min, buf, len);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    There is no bug here, but the code is not perfect either. It copies
    sizeof(pkt_dev->/member/) - 1 from user space into buf, and then does
    a strcmp(pkt_dev->/member/, buf) hence assuming buf will be null-terminated
    and shorter than pkt_dev->/member/ (pkt_dev->/member/ is never
    explicitly null-terminated, and strncpy() doesn't have to null-terminate
    so the assumption must be on buf). The use of strncpy() without explicit
    null-termination looks suspicious. Convert to use straight strcpy().

    strncpy() would also null-pad the output, but that's clearly unnecessary
    since the author calls memset(pkt_dev->/member/, 0, sizeof(..)); prior
    to strncpy(), anyway.

    While at it format the code for "dst_min", "dst_max", "src_min" and
    "src_max" in the same way by removing extra new lines in one case.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Jiong Wang
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

23 Jun, 2018

1 commit

  • This patch adds the xfrm interface id as a lookup key
    for xfrm states and policies. With this we can assign
    states and policies to virtual xfrm interfaces.

    Signed-off-by: Steffen Klassert
    Acked-by: Shannon Nelson
    Acked-by: Benedict Wong
    Tested-by: Benedict Wong
    Tested-by: Antony Antony
    Reviewed-by: Eyal Birger

    Steffen Klassert
     

13 Jun, 2018

1 commit

  • The vzalloc_node() function has no 2-factor argument form, so
    multiplication factors need to be wrapped in array_size(). This patch
    replaces cases of:

    vzalloc_node(a * b, node)

    with:
    vzalloc_node(array_size(a, b), node)

    as well as handling cases of:

    vzalloc_node(a * b * c, node)

    with:

    vzalloc_node(array3_size(a, b, c), node)

    This does, however, attempt to ignore constant size factors like:

    vzalloc_node(4 * 1024, node)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    vzalloc_node(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    vzalloc_node(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    vzalloc_node(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    vzalloc_node(
    - sizeof(TYPE) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * COUNT_ID
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * COUNT_ID
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    vzalloc_node(
    - SIZE * COUNT
    + array_size(COUNT, SIZE)
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    vzalloc_node(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    vzalloc_node(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    vzalloc_node(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    vzalloc_node(C1 * C2 * C3, ...)
    |
    vzalloc_node(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants.
    @@
    expression E1, E2;
    constant C1, C2;
    @@

    (
    vzalloc_node(C1 * C2, ...)
    |
    vzalloc_node(
    - E1 * E2
    + array_size(E1, E2)
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

28 Mar, 2018

1 commit


14 Mar, 2018

2 commits

  • _buf_ is an array and the one that must be freed is _tp_ instead.

    Fixes: a870a02cc963 ("pktgen: use dynamic allocation for debug print buffer")
    Reported-by: Wang Jian
    Signed-off-by: Gustavo A. R. Silva
    Acked-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     
  • After the removal of the VLA, we get a harmless warning about a large
    stack frame:

    net/core/pktgen.c: In function 'pktgen_if_write':
    net/core/pktgen.c:1710:1: error: the frame size of 1076 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

    The function was previously shown to be safe despite hitting
    the 1024 bye warning level. To get rid of the annoyging warning,
    while keeping it readable, this changes it to use strndup_user().

    Obviously this is not a fast path, so the kmalloc() overhead
    can be disregarded.

    Fixes: 35951393bbff ("pktgen: Remove VLA usage")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

10 Mar, 2018

1 commit


09 Mar, 2018

1 commit

  • These pernet_operations create per-net pktgen threads
    and /proc entries. These pernet subsys looks closed
    in itself, and there are no pernet_operations outside
    this file, which are interested in the threads.
    Init and/or exit methods look safe to be executed
    in parallel.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

25 Jan, 2018

4 commits


17 Jan, 2018

1 commit

  • /proc has been ignoring struct file_operations::owner field for 10 years.
    Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
    ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
    inode->i_fop is initialized with proxy struct file_operations for
    regular files:

    - if (de->proc_fops)
    - inode->i_fop = de->proc_fops;
    + if (de->proc_fops) {
    + if (S_ISREG(inode->i_mode))
    + inode->i_fop = &proc_reg_file_ops;
    + else
    + inode->i_fop = de->proc_fops;
    + }

    VFS stopped pinning module at this point.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

30 Nov, 2017

1 commit

  • XFRM bundle child chains look like this:

    xdst1 --> xdst2 --> xdst3 --> path_dst

    All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
    The final child pointer in the chain, here called 'path_dst', is some
    other kind of route such as an ipv4 or ipv6 one.

    The xfrm output path pops routes, one at a time, via the child
    pointer, until we hit one which has a dst->xfrm pointer which
    is NULL.

    We can easily preserve the above mechanisms with child sitting
    only in the xfrm_dst structure. All children in the chain
    before we break out of the xfrm_output() loop have dst->xfrm
    non-NULL and are therefore xfrm_dst objects.

    Since we break out of the loop when we find dst->xfrm NULL, we
    will not try to dereference 'dst' as if it were an xfrm_dst.

    Signed-off-by: David S. Miller

    David Miller
     

16 Nov, 2017

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Maintain the TCP retransmit queue using an rbtree, with 1GB
    windows at 100Gb this really has become necessary. From Eric
    Dumazet.

    2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.

    3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
    Lunn.

    4) Add meter action support to openvswitch, from Andy Zhou.

    5) Add a data meta pointer for BPF accessible packets, from Daniel
    Borkmann.

    6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.

    7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.

    8) More work to move the RTNL mutex down, from Florian Westphal.

    9) Add 'bpftool' utility, to help with bpf program introspection.
    From Jakub Kicinski.

    10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
    Dangaard Brouer.

    11) Support 'blocks' of transformations in the packet scheduler which
    can span multiple network devices, from Jiri Pirko.

    12) TC flower offload support in cxgb4, from Kumar Sanghvi.

    13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
    Leitner.

    14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.

    15) Add RED qdisc offloadability, and use it in mlxsw driver. From
    Nogah Frankel.

    16) eBPF based device controller for cgroup v2, from Roman Gushchin.

    17) Add some fundamental tracepoints for TCP, from Song Liu.

    18) Remove garbage collection from ipv6 route layer, this is a
    significant accomplishment. From Wei Wang.

    19) Add multicast route offload support to mlxsw, from Yotam Gigi"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
    tcp: highest_sack fix
    geneve: fix fill_info when link down
    bpf: fix lockdep splat
    net: cdc_ncm: GetNtbFormat endian fix
    openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
    netem: remove unnecessary 64 bit modulus
    netem: use 64 bit divide by rate
    tcp: Namespace-ify sysctl_tcp_default_congestion_control
    net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
    ipv6: set all.accept_dad to 0 by default
    uapi: fix linux/tls.h userspace compilation error
    usbnet: ipheth: prevent TX queue timeouts when device not ready
    vhost_net: conditionally enable tx polling
    uapi: fix linux/rxrpc.h userspace compilation errors
    net: stmmac: fix LPI transitioning for dwmac4
    atm: horizon: Fix irq release error
    net-sysfs: trigger netlink notification on ifalias change via sysfs
    openvswitch: Using kfree_rcu() to simplify the code
    openvswitch: Make local function ovs_nsh_key_attr_size() static
    openvswitch: Fix return value check in ovs_meter_cmd_features()
    ...

    Linus Torvalds
     

08 Nov, 2017

1 commit

  • Timestamps in pktgen are currently retrieved using the deprecated
    do_gettimeofday() function that wraps its signed 32-bit seconds in 2038
    (on 32-bit architectures) and requires a division operation to calculate
    microseconds.

    The pktgen header is also defined with the same limitations, hardcoding
    to a 32-bit seconds field that can be interpreted as unsigned to produce
    times that only wrap in 2106. Whatever code reads the timestamps should
    be aware of that problem in general, but probably doesn't care too
    much as we are mostly interested in the time passing between packets,
    and that is correctly represented.

    Using 64-bit nanoseconds would be cheaper and good for 584 years. Using
    monotonic times would also make this unambiguous by avoiding the overflow,
    but would make it harder to correlate to the times with those on remote
    machines. Either approach would require adding a new runtime flag and
    implementing the same thing on the remote side, which we probably don't
    want to do unless someone sees it as a real problem. Also, this should
    be coordinated with other pktgen implementations and might need a new
    magic number.

    For the moment, I'm documenting the overflow in the source code, and
    changing the implementation over to an open-coded ktime_get_real_ts64()
    plus division, so we don't have to look at it again while scanning for
    deprecated time interfaces.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

05 Nov, 2017

1 commit

  • pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an
    IPv6 address.

    Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old
    bug is hitting us.

    Fixes: 3f27fb23219e ("ipv6: addrconf: add per netns perturbation in inet6_addr_hash()")
    Signed-off-by: Eric Dumazet
    Reported-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Oct, 2017

1 commit

  • …READ_ONCE()/WRITE_ONCE()

    Please do not apply this to mainline directly, instead please re-run the
    coccinelle script shown below and apply its output.

    For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
    preference to ACCESS_ONCE(), and new code is expected to use one of the
    former. So far, there's been no reason to change most existing uses of
    ACCESS_ONCE(), as these aren't harmful, and changing them results in
    churn.

    However, for some features, the read/write distinction is critical to
    correct operation. To distinguish these cases, separate read/write
    accessors must be used. This patch migrates (most) remaining
    ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
    coccinelle script:

    ----
    // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
    // WRITE_ONCE()

    // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

    virtual patch

    @ depends on patch @
    expression E1, E2;
    @@

    - ACCESS_ONCE(E1) = E2
    + WRITE_ONCE(E1, E2)

    @ depends on patch @
    expression E;
    @@

    - ACCESS_ONCE(E)
    + READ_ONCE(E)
    ----

    Signed-off-by: Mark Rutland <mark.rutland@arm.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: davem@davemloft.net
    Cc: linux-arch@vger.kernel.org
    Cc: mpe@ellerman.id.au
    Cc: shuah@kernel.org
    Cc: snitzer@redhat.com
    Cc: thor.thayer@linux.intel.com
    Cc: tj@kernel.org
    Cc: viro@zeniv.linux.org.uk
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Mark Rutland
     

01 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

16 Jun, 2017

3 commits

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions return void * and remove all the casts across
    the tree, adding a (u8 *) cast only where the unsigned char pointer
    was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

    Note that the last part there converts from push(...)[0] to the
    more idiomatic *(u8 *)push(...).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions (skb_put, __skb_put and pskb_put) return void *
    and remove all the casts across the tree, adding a (u8 *) cast only
    where the unsigned char pointer was used directly, all done with the
    following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    which actually doesn't cover pskb_put since there are only three
    users overall.

    A handful of stragglers were converted manually, notably a macro in
    drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
    instances in net/bluetooth/hci_sock.c. In the former file, I also
    had to fix one whitespace problem spatch introduced.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

09 Jan, 2017

1 commit

  • Extract the remaining two fields from tc_verd and remove the __u16
    completely. TC_AT and TC_FROM are converted to equivalent two-bit
    integer fields tc_at and tc_from. Where possible, use existing
    helper skb_at_tc_ingress when reading tc_at. Introduce helper
    skb_reset_tc to clear fields.

    Not documenting tc_from and tc_at, because they will be replaced
    with single bit fields in follow-on patches.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

17 Oct, 2016

1 commit

  • After Jesper commit back in linux-3.18, we trigger a lockdep
    splat in proc_create_data() while allocating memory from
    pktgen_change_name().

    This patch converts t->if_lock to a mutex, since it is now only
    used from control path, and adds proper locking to pktgen_change_name()

    1) pktgen_thread_lock to protect the outer loop (iterating threads)
    2) t->if_lock to protect the inner loop (iterating devices)

    Note that before Jesper patch, pktgen_change_name() was lacking proper
    protection, but lockdep was not able to detect the problem.

    Fixes: 8788370a1d4b ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()")
    Reported-by: John Sperbeck
    Signed-off-by: Eric Dumazet
    Cc: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Oct, 2016

1 commit

  • The commit 879c7220e828 ("net: pktgen: Observe needed_headroom
    of the device") increased the 'pkt_overhead' field value by
    LL_RESERVED_SPACE.
    As a side effect the generated packet size, computed as:

    /* Eth + IPh + UDPh + mpls */
    datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 -
    pkt_dev->pkt_overhead;

    is decreased by the same value.
    The above changed slightly the behavior of existing pktgen users,
    and made the procfs interface somewhat inconsistent.
    Fix it by restoring the previous pkt_overhead value and using
    LL_RESERVED_SPACE as extralen in skb allocation.
    Also, change pktgen_alloc_skb() to only partially reserve
    the headroom to allow the caller to prefetch from ll header
    start.

    v1 -> v2:
    - fixed some typos in the comments

    Fixes: 879c7220e828 ("net: pktgen: Observe needed_headroom of the device")
    Suggested-by: Ben Greear
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

05 Jul, 2016

1 commit


13 Jun, 2016

1 commit

  • sch_atm returns this when TC_ACT_SHOT classification occurs.

    But all other schedulers that use tc_classify
    (htb, hfsc, drr, fq_codel ...) return NET_XMIT_SUCCESS | __BYPASS
    in this case so just do that in atm.

    BATMAN uses it as an intermediate return value to signal
    forwarding vs. buffering, but it did not return POLICED to
    callers outside of BATMAN.

    Reviewed-by: Sven Eckelmann
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

01 Jun, 2016

1 commit


27 Apr, 2016

1 commit


02 Mar, 2016

1 commit


12 Jan, 2016

2 commits


16 Dec, 2015

1 commit

  • These netif flags are unnecessary convolutions. It is more
    straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM,
    and NETIF_F_IPV6_CSUM directly.

    This patch also:
    - Cleans up can_checksum_protocol
    - Simplifies netdev_intersect_features

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

14 Aug, 2015

1 commit


07 Aug, 2015

1 commit

  • Commit 1fbe4b46caca "net: pktgen: kill the Wait for kthread_stop
    code in pktgen_thread_worker()" removed (in particular) the final
    __set_current_state(TASK_RUNNING) and I didn't notice the previous
    set_current_state(TASK_INTERRUPTIBLE). This triggers the warning
    in __might_sleep() after return.

    Afaics, we can simply remove both set_current_state()'s, and we
    could do this a long ago right after ef87979c273a2 "pktgen: better
    scheduler friendliness" which changed pktgen_thread_worker() to
    use wait_event_interruptible_timeout().

    Reported-by: Huang Ying
    Signed-off-by: Oleg Nesterov
    Signed-off-by: David S. Miller

    Oleg Nesterov
     

30 Jul, 2015

1 commit