12 Apr, 2018

1 commit

  • [ Upstream commit 5f42df013b8bc1b6511af7a04bf93b014884ae2a ]

    Use dev_valid_name() to make sure user does not provide illegal
    device name.

    syzbot caught the following bug :

    BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
    BUG: KASAN: stack-out-of-bounds in ip6gre_tunnel_locate+0x334/0x860 net/ipv6/ip6_gre.c:339
    Write of size 20 at addr ffff8801afb9f7b8 by task syzkaller851048/4466

    CPU: 1 PID: 4466 Comm: syzkaller851048 Not tainted 4.16.0+ #1
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x1b9/0x29f lib/dump_stack.c:53
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412
    check_memory_region_inline mm/kasan/kasan.c:260 [inline]
    check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
    memcpy+0x37/0x50 mm/kasan/kasan.c:303
    strlcpy include/linux/string.h:300 [inline]
    ip6gre_tunnel_locate+0x334/0x860 net/ipv6/ip6_gre.c:339
    ip6gre_tunnel_ioctl+0x69d/0x12e0 net/ipv6/ip6_gre.c:1195
    dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334
    dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525
    sock_ioctl+0x47e/0x680 net/socket.c:1015
    vfs_ioctl fs/ioctl.c:46 [inline]
    file_ioctl fs/ioctl.c:500 [inline]
    do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
    ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
    SYSC_ioctl fs/ioctl.c:708 [inline]
    SyS_ioctl+0x24/0x30 fs/ioctl.c:706
    do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

03 Mar, 2018

1 commit

  • [ Upstream commit 2c52129a7d74d017320804c6928de770815c5f4a ]

    The same fix as the patch "ip_gre: remove the incorrect mtu limit for
    ipgre tap" is also needed for ip6_gre.

    Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     

31 Jan, 2018

1 commit

  • [ Upstream commit 128bb975dc3c25d00de04e503e2fe0a780d04459 ]

    Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path,
    call common GRE functions") moved dev->mtu initialization
    from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a
    result, the previously set values, before ndo_init(), are
    reset in the following cases:

    * rtnl_create_link() can update dev->mtu from IFLA_MTU
    parameter.

    * ip6gre_tnl_link_config() is invoked before ndo_init() in
    netlink and ioctl setup, so ndo_init() can reset MTU
    adjustments with the lower device MTU as well, dev->mtu
    and dev->hard_header_len.

    Not applicable for ip6gretap because it has one more call
    to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().

    Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]'
    parameter if a user sets it manually on a device creation,
    and fix the second one by moving ip6gre_tnl_link_config()
    call after register_netdevice().

    Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
    Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     

03 Jan, 2018

1 commit

  • [ Upstream commit e5a9336adb317db55eb3fe8200856096f3c71109 ]

    When ip6gre is created using ioctl, its features, such as
    scatter-gather, GSO and tx-checksumming will be turned off:

    # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1
    # ethtool -k gre6 (truncated output)
    tx-checksumming: off
    scatter-gather: off
    tcp-segmentation-offload: off
    generic-segmentation-offload: off [requested on]

    But when netlink is used, they will be enabled:
    # ip link add gre6 type ip6gre remote fd00::1
    # ethtool -k gre6 (truncated output)
    tx-checksumming: on
    scatter-gather: on
    tcp-segmentation-offload: on
    generic-segmentation-offload: on

    This results in a loss of performance when gre6 is created via ioctl.
    The issue was found with LTP/gre tests.

    Fix it by moving the setup of device features to a separate function
    and invoke it with ndo_init callback because both netlink and ioctl
    will eventually call it via register_netdevice():

    register_netdevice()
    - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init()
    - ip6gre_tunnel_init_common()
    - ip6gre_tnl_init_features()

    The moved code also contains two minor style fixes:
    * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line.
    * fixed the issue reported by checkpatch: "Unnecessary parentheses around
    'nt->encap.type == TUNNEL_ENCAP_NONE'"

    Fixes: ac4eb009e477 ("ip6gre: Add support for basic offloads offloads excluding GSO")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     

14 Dec, 2017

1 commit

  • [ Upstream commit 981542c526ecd846920bc500e9989da906ee9fb9 ]

    After commit 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call
    common GRE functions") it's not used anywhere in the module, but
    previously was used in ip6gre_rcv().

    Fixes: 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     

27 Oct, 2017

2 commits

  • When receiving a Toobig icmpv6 packet, ip6gre_err would just set
    tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may
    still be using the old value, it has no chance to be updated with
    tunnel dev's mtu.

    Jianlin found this issue by reducing route's mtu while running
    netperf, the performance went to 0.

    ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup
    the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig
    to upper socket after setting tunnel dev's mtu.

    We couldn't do that for ip6_gre, as gre's inner packet could be
    any protocol, it's difficult to handle them (like lookup upper
    dst) in a good way.

    So this patch is to fix it by updating skb_dst(skb)'s pmtu when
    dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this
    update there, as usually dev->mtu
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • The similar fix in patch 'ipip: only increase err_count for some
    certain type icmp in ipip_err' is needed for ip6gre_err.

    In Jianlin's case, udp netperf broke even when receiving a TooBig
    icmpv6 packet.

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

01 Oct, 2017

1 commit


19 Sep, 2017

1 commit

  • Now in ip6gre_header before packing the ipv6 header, it skb_push t->hlen
    which only includes encap_hlen + tun_hlen. It means greh and inner header
    would be over written by ipv6 stuff and ipv6h might have no chance to set
    up.

    Jianlin found this issue when using remote any on ip6_gre, the packets he
    captured on gre dev are truncated:

    22:50:26.210866 Out ethertype IPv6 (0x86dd), length 120: truncated-ip6 -\
    8128 bytes missing!(flowlabel 0x92f40, hlim 0, next-header Options (0) \
    payload length: 8192) ::1:2000:0 > ::1:0:86dd: HBH [trunc] ip-proto-128 \
    8184

    It should also skb_push ipv6hdr so that ipv6h points to the right position
    to set ipv6 stuff up.

    This patch is to skb_push hlen + sizeof(*ipv6h) and also fix some indents
    in ip6gre_header.

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

08 Sep, 2017

1 commit

  • Now when probessing ICMPV6_PKT_TOOBIG, ip6gre_err only subtracts the
    offset of gre header from mtu info. The expected mtu of gre device
    should also subtract gre header. Otherwise, the next packets still
    can't be sent out.

    Jianlin found this issue when using the topo:
    client(ip6gre)(nic1)route(nic2)(ip6gre)server

    and reducing nic2's mtu, then both tcp and sctp's performance with
    big size data became 0.

    This patch is to fix it by also subtracting grehdr (tun->tun_hlen)
    from mtu info when updating gre device's mtu in ip6gre_err(). It
    also needs to subtract ETH_HLEN if gre dev'type is ARPHRD_ETHER.

    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

27 Jun, 2017

3 commits


16 Jun, 2017

1 commit

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions return void * and remove all the casts across
    the tree, adding a (u8 *) cast only where the unsigned char pointer
    was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

    Note that the last part there converts from push(...)[0] to the
    more idiomatic *(u8 *)push(...).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

08 Jun, 2017

1 commit

  • Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init(). However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

    Signed-off-by: David S. Miller

    David S. Miller
     

27 May, 2017

1 commit

  • This fix addresses two problems in the way the DSCP field is formulated
    on the encapsulating header of IPv6 tunnels.
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661

    1) The IPv6 tunneling code was manipulating the DSCP field of the
    encapsulating packet using the 32b flowlabel. Since the flowlabel is
    only the lower 20b it was incorrect to assume that the upper 12b
    containing the DSCP and ECN fields would remain intact when formulating
    the encapsulating header. This fix handles the 'inherit' and
    'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.

    2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
    incorrect and resulted in the DSCP value always being set to 0.

    Commit 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class
    is non-0") caused the regression by masking out the flowlabel
    which exposed the incorrect handling of the DSCP portion of the
    flowlabel in ip6_tunnel and ip6_gre.

    Fixes: 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class is non-0")
    Signed-off-by: Peter Dawson
    Signed-off-by: David S. Miller

    Peter Dawson
     

22 Apr, 2017

1 commit


08 Feb, 2017

1 commit


06 Feb, 2017

1 commit

  • Andrey Konovalov reported out of bound accesses in ip6gre_err()

    If GRE flags contains GRE_KEY, the following expression
    *(((__be32 *)p) + (grehlen / 4) - 1)

    accesses data ~40 bytes after the expected point, since
    grehlen includes the size of IPv6 headers.

    Let's use a "struct gre_base_hdr *greh" pointer to make this
    code more readable.

    p[1] becomes greh->protocol.
    grhlen is the GRE header length.

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Jan, 2017

1 commit


27 Jan, 2017

1 commit


25 Jan, 2017

1 commit

  • Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull(),
    we must reload any pointer that was related to skb->head
    (or skb->data), or risk use after free.

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Signed-off-by: Eric Dumazet
    Cc: Dmitry Kozlov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Jan, 2017

1 commit


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

05 Nov, 2016

1 commit

  • - Use the UID in routing lookups made by protocol connect() and
    sendmsg() functions.
    - Make sure that routing lookups triggered by incoming packets
    (e.g., Path MTU discovery) take the UID of the socket into
    account.
    - For packets not associated with a userspace socket, (e.g., ping
    replies) use UID 0 inside the user namespace corresponding to
    the network namespace the socket belongs to. This allows
    all namespaces to apply routing and iptables rules to
    kernel-originated traffic in that namespaces by matching UID 0.
    This is better than using the UID of the kernel socket that is
    sending the traffic, because the UID of kernel sockets created
    at namespace creation time (e.g., the per-processor ICMP and
    TCP sockets) is the UID of the user that created the socket,
    which might not be mapped in the namespace.

    Tested: compiles allnoconfig, allyesconfig, allmodconfig
    Tested: https://android-review.googlesource.com/253302
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     

03 Oct, 2016

1 commit


25 Sep, 2016

1 commit


24 Sep, 2016

1 commit

  • Similar to commit 3be07244b733 ("ip6_gre: fix flowi6_proto value in
    xmit path"), set flowi6_proto to IPPROTO_GRE for output route lookup.

    Up until now, ip6gre_xmit_other() has set flowi6_proto to a bogus value.
    This affected output route lookup for packets sent on an ip6gretap device
    in cases where routing was dependent on the value of flowi6_proto.

    Since the correct proto is already set in the tunnel flowi6 template via
    commit 252f3f5a1189 ("ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit
    path."), simply delete the line setting the incorrect flowi6_proto value.

    Suggested-by: Jiri Benc
    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Reviewed-by: Shmulik Ladkani
    Signed-off-by: Lance Richardson
    Signed-off-by: David S. Miller

    Lance Richardson
     

18 Aug, 2016

1 commit


16 Aug, 2016

1 commit

  • Ensure that the inner_protocol is set on transmit so that GSO segmentation,
    which relies on that field, works correctly.

    This is achieved by setting the inner_protocol in gre_build_header rather
    than each caller of that function. It ensures that the inner_protocol is
    set when gre_fb_xmit() is used to transmit GRE which was not previously the
    case.

    I have observed this is not the case when OvS transmits GRE using
    lwtunnel metadata (which it always does).

    Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol")
    Cc: Pravin Shelar
    Acked-by: Alexander Duyck
    Signed-off-by: Simon Horman
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Simon Horman
     

11 Aug, 2016

1 commit

  • This is a preparatory patch for converting qdisc linked list into a
    hashtable. As we'll need to include hashtable.h in netdevice.h, we first
    have to make sure that this will not introduce symbol conflicts for any of
    the netdevice.h users.

    Reviewed-by: Cong Wang
    Signed-off-by: Jiri Kosina
    Signed-off-by: David S. Miller

    Jiri Kosina
     

16 Jun, 2016

1 commit

  • 1) gre_parse_header() can be called from gre_err()

    At this point transport header points to ICMP header, not the inner
    header.

    2) We can not really change transport header as ipgre_err() will later
    assume transport header still points to ICMP header (using icmp_hdr())

    3) pskb_may_pull() logic in gre_parse_header() really works
    if we are interested at zone pointed by skb->data

    4) As Jiri explained in commit b7f8fe251e46 ("gre: do not pull header in
    ICMP error processing") we should not pull headers in error handler.

    So this fix :

    A) changes gre_parse_header() to use skb->data instead of
    skb_transport_header()

    B) Adds a nhs parameter to gre_parse_header() so that we can skip the
    not pulled IP header from error path.
    This offset is 0 for normal receive path.

    C) remove obsolete IPV6 includes

    Signed-off-by: Eric Dumazet
    Cc: Tom Herbert
    Cc: Maciej Żenczykowski
    Cc: Jiri Benc
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Jun, 2016

1 commit


25 May, 2016

2 commits


21 May, 2016

2 commits


13 May, 2016

2 commits


10 May, 2016

1 commit