23 Jan, 2019

2 commits

  • [ Upstream commit 4a06fa67c4da20148803525151845276cdb995c1 ]

    Commit 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call
    pskb_may_pull") avoided a read beyond the end of the skb linear
    segment by calling pskb_may_pull.

    That function can trigger a BUG_ON in pskb_expand_head if the skb is
    shared, which it is when when peeking. It can also return ENOMEM.

    Avoid both by switching to safer skb_header_pointer.

    Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
    Reported-by: syzbot
    Suggested-by: Eric Dumazet
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 7d033c9f6a7fd3821af75620a0257db87c2b552a ]

    This patch makes sure the flow label in the IPv6 header
    forged in ipv6_local_error() is initialized.

    BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
    CPU: 1 PID: 24675 Comm: syz-executor1 Not tainted 4.20.0-rc7+ #4
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x173/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
    kmsan_internal_check_memory+0x455/0xb00 mm/kmsan/kmsan.c:675
    kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601
    _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
    copy_to_user include/linux/uaccess.h:177 [inline]
    move_addr_to_user+0x2e9/0x4f0 net/socket.c:227
    ___sys_recvmsg+0x5d7/0x1140 net/socket.c:2284
    __sys_recvmsg net/socket.c:2327 [inline]
    __do_sys_recvmsg net/socket.c:2337 [inline]
    __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
    __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x457ec9
    Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f8750c06c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9
    RDX: 0000000000002000 RSI: 0000000020000400 RDI: 0000000000000005
    RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8750c076d4
    R13: 00000000004c4a60 R14: 00000000004d8140 R15: 00000000ffffffff

    Uninit was stored to memory at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
    kmsan_save_stack mm/kmsan/kmsan.c:219 [inline]
    kmsan_internal_chain_origin+0x134/0x230 mm/kmsan/kmsan.c:439
    __msan_chain_origin+0x70/0xe0 mm/kmsan/kmsan_instr.c:200
    ipv6_recv_error+0x1e3f/0x1eb0 net/ipv6/datagram.c:475
    udpv6_recvmsg+0x398/0x2ab0 net/ipv6/udp.c:335
    inet_recvmsg+0x4fb/0x600 net/ipv4/af_inet.c:830
    sock_recvmsg_nosec net/socket.c:794 [inline]
    sock_recvmsg+0x1d1/0x230 net/socket.c:801
    ___sys_recvmsg+0x4d5/0x1140 net/socket.c:2278
    __sys_recvmsg net/socket.c:2327 [inline]
    __do_sys_recvmsg net/socket.c:2337 [inline]
    __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
    __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
    kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
    kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
    kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2759 [inline]
    __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
    __kmalloc_reserve net/core/skbuff.c:137 [inline]
    __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
    alloc_skb include/linux/skbuff.h:998 [inline]
    ipv6_local_error+0x1a7/0x9e0 net/ipv6/datagram.c:334
    __ip6_append_data+0x129f/0x4fd0 net/ipv6/ip6_output.c:1311
    ip6_make_skb+0x6cc/0xcf0 net/ipv6/ip6_output.c:1775
    udpv6_sendmsg+0x3f8e/0x45d0 net/ipv6/udp.c:1384
    inet_sendmsg+0x54a/0x720 net/ipv4/af_inet.c:798
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    __sys_sendto+0x8c4/0xac0 net/socket.c:1788
    __do_sys_sendto net/socket.c:1800 [inline]
    __se_sys_sendto+0x107/0x130 net/socket.c:1796
    __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    Bytes 4-7 of 28 are uninitialized
    Memory access of size 28 starts at ffff8881937bfce0
    Data copied to user address 0000000020000000

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

28 Jul, 2018

1 commit

  • [ Upstream commit 2efd4fca703a6707cad16ab486eaab8fc7f0fd49 ]

    Syzbot reported a read beyond the end of the skb head when returning
    IPV6_ORIGDSTADDR:

    BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
    CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
    kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
    kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
    copy_to_user include/linux/uaccess.h:184 [inline]
    put_cmsg+0x5ef/0x860 net/core/scm.c:242
    ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
    ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
    rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
    [..]

    This logic and its ipv4 counterpart read the destination port from
    the packet at skb_transport_offset(skb) + 4.

    With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
    packet that stores headers exactly up to skb_transport_offset(skb) in
    the head and the remainder in a frag.

    Call pskb_may_pull before accessing the pointer to ensure that it lies
    in skb head.

    Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
    Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

26 Jun, 2018

1 commit

  • [ Upstream commit 6c206b20092a3623184cff9470dba75d21507874 ]

    After commit 6b229cf77d68 ("udp: add batching to udp_rmem_release()")
    the sk_rmem_alloc field does not measure exactly anymore the
    receive queue length, because we batch the rmem release. The issue
    is really apparent only after commit 0d4a6608f68c ("udp: do rmem bulk
    free even if the rx sk queue is empty"): the user space can easily
    check for an empty socket with not-0 queue length reported by the 'ss'
    tool or the procfs interface.

    We need to use a custom UDP helper to report the correct queue length,
    taking into account the forward allocation deficit.

    Reported-by: trevor.francis@46labs.com
    Fixes: 6b229cf77d68 ("UDP: add batching to udp_rmem_release()")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

01 Apr, 2018

2 commits

  • [ Upstream commit 5f2fb802eee1df0810b47ea251942fe3fd36589a ]

    Fixes: 2f987a76a977 ("net: ipv6: keep sk status consistent after datagram connect failure")
    Signed-off-by: Stefano Brivio
    Acked-by: Paolo Abeni
    Acked-by: Guillaume Nault
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Stefano Brivio
     
  • [ Upstream commit 2f987a76a97773beafbc615b9c4d8fe79129a7f4 ]

    On unsuccesful ip6_datagram_connect(), if the failure is caused by
    ip6_datagram_dst_update(), the sk peer information are cleared, but
    the sk->sk_state is preserved.

    If the socket was already in an established status, the overall sk
    status is inconsistent and fouls later checks in datagram code.

    Fix this saving the old peer information and restoring them in
    case of failure. This also aligns ipv6 datagram connect() behavior
    with ipv4.

    v1 -> v2:
    - added missing Fixes tag

    Fixes: 85cb73ff9b74 ("net: ipv6: reset daddr and dport in sk if connect() fails")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

01 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    This patch uses refcount_inc_not_zero() instead of
    atomic_inc_not_zero_hint() due to absense of a _hint()
    version of refcount API. If the hint() version must
    be used, we might need to revisit API.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

25 Jun, 2017

1 commit

  • In __ip6_datagram_connect(), reset sk->sk_v6_daddr and inet->dport if
    error occurs.
    In udp_v6_early_demux(), check for sk_state to make sure it is in
    TCP_ESTABLISHED state.
    Together, it makes sure unconnected UDP socket won't be considered as a
    valid candidate for early demux.

    v3: add TCP_ESTABLISHED state check in udp_v6_early_demux()
    v2: fix compilation error

    Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
    Signed-off-by: Wei Wang
    Acked-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Wei Wang
     

18 Apr, 2017

1 commit

  • Syzkaller reported a use-after-free in ip_recv_error at line

    info->ipi_ifindex = skb->dev->ifindex;

    This function is called on dequeue from the error queue, at which
    point the device pointer may no longer be valid.

    Save ifindex on enqueue in __skb_complete_tx_timestamp, when the
    pointer is valid or NULL. Store it in temporary storage skb->cb.

    It is safe to reference skb->dev here, as called from device drivers
    or dev_queue_xmit. The exception is when called from tcp_ack_tstamp;
    in that case it is NULL and ifindex is set to 0 (invalid).

    Do not return a pktinfo cmsg if ifindex is 0. This maintains the
    current behavior of not returning a cmsg if skb->dev was NULL.

    On dequeue, the ipv4 path will cast from sock_exterr_skb to
    in_pktinfo. Both have ifindex as their first element, so no explicit
    conversion is needed. This is by design, introduced in commit
    0b922b7a829c ("net: original ingress device index in PKTINFO"). For
    ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo.

    Fixes: 829ae9d61165 ("net-timestamp: allow reading recv cmsg on errqueue with origin tstamp")
    Reported-by: Andrey Konovalov
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

15 Feb, 2017

1 commit

  • This patch adds a check on the type of the source address for the case
    where the destination address is in6addr_any. If the source is an
    IPv4-mapped IPv6 source address, the destination is changed to
    ::ffff:127.0.0.1, and otherwise the destination is changed to ::1. This
    is done in three locations to handle UDP calls to either connect() or
    sendmsg() and TCP calls to connect(). Note that udpv6_sendmsg() delays
    handling an in6addr_any destination until very late, so the patch only
    needs to handle the case where the source is an IPv4-mapped IPv6
    address.

    Signed-off-by: Jonathan T. Leighton
    Signed-off-by: David S. Miller

    Jonathan T. Leighton
     

25 Dec, 2016

1 commit


24 Dec, 2016

1 commit

  • Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
    the packet. For sockets that have transport headers pulled, transport
    offset can be negative. Use signed comparison to avoid overflow.

    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Reported-by: Nisar Jagabar
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

04 Dec, 2016

1 commit

  • Couple conflicts resolved here:

    1) In the MACB driver, a bug fix to properly initialize the
    RX tail pointer properly overlapped with some changes
    to support variable sized rings.

    2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
    overlapping with a reorganization of the driver to support
    ACPI, OF, as well as PCI variants of the chip.

    3) In 'net' we had several probe error path bug fixes to the
    stmmac driver, meanwhile a lot of this code was cleaned up
    and reorganized in 'net-next'.

    4) The cls_flower classifier obtained a helper function in
    'net-next' called __fl_delete() and this overlapped with
    Daniel Borkamann's bug fix to use RCU for object destruction
    in 'net'. It also overlapped with Jiri's change to guard
    the rhashtable_remove_fast() call with a check against
    tc_skip_sw().

    5) In mlx4, a revert bug fix in 'net' overlapped with some
    unrelated changes in 'net-next'.

    6) In geneve, a stale header pointer after pskb_expand_head()
    bug fix in 'net' overlapped with a large reorganization of
    the same code in 'net-next'. Since the 'net-next' code no
    longer had the bug in question, there was nothing to do
    other than to simply take the 'net-next' hunks.

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Dec, 2016

1 commit


05 Nov, 2016

1 commit

  • - Use the UID in routing lookups made by protocol connect() and
    sendmsg() functions.
    - Make sure that routing lookups triggered by incoming packets
    (e.g., Path MTU discovery) take the UID of the socket into
    account.
    - For packets not associated with a userspace socket, (e.g., ping
    replies) use UID 0 inside the user namespace corresponding to
    the network namespace the socket belongs to. This allows
    all namespaces to apply routing and iptables rules to
    kernel-originated traffic in that namespaces by matching UID 0.
    This is better than using the UID of the kernel socket that is
    sending the traffic, because the UID of kernel sockets created
    at namespace creation time (e.g., the per-processor ICMP and
    TCP sockets) is the UID of the user that created the socket,
    which might not be mapped in the namespace.

    Tested: compiles allnoconfig, allyesconfig, allmodconfig
    Tested: https://android-review.googlesource.com/253302
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     

04 Nov, 2016

1 commit

  • When reading a datagram or raw packet that arrived fragmented, expose
    the maximum fragment size if recorded to allow applications to
    estimate receive path MTU.

    At this point, the field is only recorded when ipv6 connection
    tracking is enabled. A follow-up patch will record this field also
    in the ipv6 input path.

    Tested using the test for IP_RECVFRAGSIZE plus

    ip netns exec to ip addr add dev veth1 fc07::1/64
    ip netns exec from ip addr add dev veth0 fc07::2/64

    ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 &
    ip netns exec from nc -q 1 -u fc07::1 6000 < payload

    Both with and without enabling connection tracking

    ip6tables -A INPUT -m state --state NEW -p udp -j LOG

    Signed-off-by: Willem de Bruijn
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

17 May, 2016

1 commit

  • __sock_cmsg_send() might return different error codes, not only -EINVAL.

    Fixes: 24025c465f77 ("ipv4: process socket-level control messages in IPv4")
    Fixes: ad1e46a83716 ("ipv6: process socket-level control messages in IPv6")
    Signed-off-by: Eric Dumazet
    Cc: Soheil Hassas Yeganeh
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 May, 2016

1 commit

  • In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
    variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
    functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
    This is not a good practice and makes it hard to add new parameters.
    This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
    ipv4 and include the above mentioned variables. And we only pass the
    pointer to this structure to corresponding functions. This makes it easier
    to add new parameters in the future and makes the function cleaner.

    Signed-off-by: Wei Wang
    Signed-off-by: David S. Miller

    Wei Wang
     

26 Apr, 2016

1 commit


24 Apr, 2016

1 commit


15 Apr, 2016

4 commits

  • This patch adds a release_cb for UDPv6. It does a route lookup
    and updates sk->sk_dst_cache if it is needed. It picks up the
    left-over job from ip6_sk_update_pmtu() if the sk was owned
    by user during the pmtu update.

    It takes a rcu_read_lock to protect the __sk_dst_get() operations
    because another thread may do ip6_dst_store() without taking the
    sk lock (e.g. sendmsg).

    Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
    Signed-off-by: Martin KaFai Lau
    Reported-by: Wei Wang
    Cc: Cong Wang
    Cc: Eric Dumazet
    Cc: Wei Wang
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • There is a case in connected UDP socket such that
    getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
    sequence could be the following:
    1. Create a connected UDP socket
    2. Send some datagrams out
    3. Receive a ICMPV6_PKT_TOOBIG
    4. No new outgoing datagrams to trigger the sk_dst_check()
    logic to update the sk->sk_dst_cache.
    5. getsockopt(IPV6_MTU) returns the mtu from the invalid
    sk->sk_dst_cache instead of the newly created RTF_CACHE clone.

    This patch updates the sk->sk_dst_cache for a connected datagram sk
    during pmtu-update code path.

    Note that the sk->sk_v6_daddr is used to do the route lookup
    instead of skb->data (i.e. iph). It is because a UDP socket can become
    connected after sending out some datagrams in un-connected state. or
    It can be connected multiple times to different destinations. Hence,
    iph may not be related to where sk is currently connected to.

    It is done under '!sock_owned_by_user(sk)' condition because
    the user may make another ip6_datagram_connect() (i.e changing
    the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
    code path.

    For the sock_owned_by_user(sk) == true case, the next patch will
    introduce a release_cb() which will update the sk->sk_dst_cache.

    Test:

    Server (Connected UDP Socket):
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Route Details:
    [root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
    2fac::/64 dev eth0 proto kernel metric 256 pref medium
    2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium

    A simple python code to create a connected UDP socket:

    import socket
    import errno

    HOST = '2fac::1'
    PORT = 8080

    s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
    s.bind((HOST, PORT))
    s.connect(('2fac:face::face', 53))
    print("connected")
    while True:
    try:
    data = s.recv(1024)
    except socket.error as se:
    if se.errno == errno.EMSGSIZE:
    pmtu = s.getsockopt(41, 24)
    print("PMTU:%d" % pmtu)
    break
    s.close()

    Python program output after getting a ICMPV6_PKT_TOOBIG:
    [root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
    connected
    PMTU:1300

    Cache routes after recieving TOOBIG:
    [root@arch-fb-vm1 ~]# ip -6 r show table cache
    2fac:face::face via 2fac::face dev eth0 metric 0
    cache expires 463sec mtu 1300 pref medium

    Client (Send the ICMPV6_PKT_TOOBIG):
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    scapy is used to generate the TOOBIG message. Here is the scapy script I have
    used:

    >>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
    1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
    >>> sendp(p, iface='qemubr0')

    Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
    Signed-off-by: Martin KaFai Lau
    Reported-by: Wei Wang
    Cc: Cong Wang
    Cc: Eric Dumazet
    Cc: Wei Wang
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • This patch moves the route lookup and update codes for connected
    datagram sk to a newly created function ip6_datagram_dst_update()

    It will be reused during the pmtu update in the later patch.

    Signed-off-by: Martin KaFai Lau
    Cc: Cong Wang
    Cc: Eric Dumazet
    Cc: Wei Wang
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • Move flowi6 init codes for connected datagram sk to a newly created
    function ip6_datagram_flow_key_init().

    Notes:
    1. fl6_flowlabel is used instead of fl6.flowlabel in __ip6_datagram_connect
    2. ipv6_addr_is_multicast(&fl6->daddr) is used instead of
    (addr_type & IPV6_ADDR_MULTICAST) in ip6_datagram_flow_key_init()

    This new function will be reused during pmtu update in the later patch.

    Signed-off-by: Martin KaFai Lau
    Cc: Cong Wang
    Cc: Eric Dumazet
    Cc: Wei Wang
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

05 Apr, 2016

1 commit

  • Process socket-level control messages by invoking
    __sock_cmsg_send in ip6_datagram_send_ctl for control messages on
    the SOL_SOCKET layer.

    This makes sure whenever ip6_datagram_send_ctl is called for
    udp and raw, we also process socket-level control messages.

    This is a bit uglier than IPv4, since IPv6 does not have
    something like ipcm_cookie. Perhaps we can later create
    a control message cookie for IPv6?

    Note that this commit interprets new control messages that
    were ignored before. As such, this commit does not change
    the behavior of IPv6 control messages.

    Signed-off-by: Soheil Hassas Yeganeh
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Soheil Hassas Yeganeh
     

30 Jan, 2016

1 commit

  • Currently, the egress interface index specified via IPV6_PKTINFO
    is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
    can be subverted when the user space application calls connect()
    before sendmsg().
    Fix it by initializing properly flowi6_oif in connect() before
    performing the route lookup.

    Signed-off-by: Paolo Abeni
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Paolo Abeni
     

03 Dec, 2015

1 commit

  • This patch addresses multiple problems :

    UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
    while socket is not locked : Other threads can change np->opt
    concurrently. Dmitry posted a syzkaller
    (http://github.com/google/syzkaller) program desmonstrating
    use-after-free.

    Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
    and dccp_v6_request_recv_sock() also need to use RCU protection
    to dereference np->opt once (before calling ipv6_dup_options())

    This patch adds full RCU protection to np->opt

    Reported-by: Dmitry Vyukov
    Signed-off-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 Sep, 2015

1 commit

  • This is to document that socket lock might not be held at this point.

    skb_set_owner_w() and ipv6_local_error() are using proper atomic ops
    or spinlocks, so we promote the socket to non const when calling them.

    netfilter hooks should never assume socket lock is held,
    we also promote the socket to non const.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Jul, 2015

1 commit

  • This patch creates sk_set_txhash and eliminates protocol specific
    inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
    random number instead of performing flow dissection. sk_set_txash
    is also allowed to be called multiple times for the same socket,
    we'll need this when redoing the hash for negative routing advice.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

23 Jul, 2015

1 commit


16 Jul, 2015

1 commit

  • ip6_datagram_connect() is doing a lot of socket changes without
    socket being locked.

    This looks wrong, at least for udp_lib_rehash() which could corrupt
    lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.

    Signed-off-by: Eric Dumazet
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Jul, 2015

1 commit


24 Jun, 2015

1 commit

  • ICMP messages can trigger ICMP and local errors. In this case
    serr->port is 0 and starting from Linux 4.0 we do not return
    the original target address to the error queue readers.
    Add function to define which errors provide addr_offset.
    With this fix my ping command is not silent anymore.

    Fixes: c247f0534cc5 ("ip: fix error queue empty skb handling")
    Signed-off-by: Julian Anastasov
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Julian Anastasov
     

01 Apr, 2015

1 commit

  • The ipv6 code uses a mixture of coding styles. In some instances check for NULL
    pointer is done as x == NULL and sometimes as !x. !x is preferred according to
    checkpatch and this patch makes the code consistent by adopting the latter
    form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

09 Mar, 2015

1 commit

  • When reading from the error queue, msg_name and msg_control are only
    populated for some errors. A new exception for empty timestamp skbs
    added a false positive on icmp errors without payload.

    `traceroute -M udpconn` only displayed gateways that return payload
    with the icmp error: the embedded network headers are pulled before
    sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise.

    Fix this regression by refining when msg_name and msg_control
    branches are taken. The solutions for the two fields are independent.

    msg_name only makes sense for errors that configure serr->port and
    serr->addr_offset. Test the first instead of skb->len. This also fixes
    another issue. saddr could hold the wrong data, as serr->addr_offset
    is not initialized in some code paths, pointing to the start of the
    network header. It is only valid when serr->port is set (non-zero).

    msg_control support differs between IPv4 and IPv6. IPv4 only honors
    requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The
    skb->len test can simply be removed, because skb->dev is also tested
    and never true for empty skbs. IPv6 honors requests for all errors
    aside from local errors and timestamps on empty skbs.

    In both cases, make the policy more explicit by moving this logic to
    a new function that decides whether to process msg_control and that
    optionally prepares the necessary fields in skb->cb[]. After this
    change, the IPv4 and IPv6 paths are more similar.

    The last case is rxrpc. Here, simply refine to only match timestamps.

    Fixes: 49ca0d8bfaf3 ("net-timestamp: no-payload option")

    Reported-by: Jan Niehusmann
    Signed-off-by: Willem de Bruijn

    ----

    Changes
    v1->v2
    - fix local origin test inversion in ip6_datagram_support_cmsg
    - make v4 and v6 code paths more similar by introducing analogous
    ipv4_datagram_support_cmsg
    - fix compile bug in rxrpc
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

03 Feb, 2015

1 commit

  • Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
    timestamps, this loops timestamps on top of empty packets.

    Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
    cmsg reception (aside from timestamps) are no longer possible. This
    works together with a follow on patch that allows administrators to
    only allow tx timestamping if it does not loop payload or metadata.

    Signed-off-by: Willem de Bruijn

    ----

    Changes (rfc -> v1)
    - add documentation
    - remove unnecessary skb->len test (thanks to Richard Cochran)
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

16 Jan, 2015

1 commit

  • The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
    structure is defined and allocated on the stack as

    struct {
    struct sock_extended_err ee;
    struct sockaddr_in(6) offender;
    } errhdr;

    The second part is only initialized for certain SO_EE_ORIGIN values.
    Always initialize it completely.

    An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
    would return uninitialized bytes.

    Signed-off-by: Willem de Bruijn

    ----

    Also verified that there is no padding between errhdr.ee and
    errhdr.offender that could leak additional kernel data.
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

11 Dec, 2014

1 commit


09 Dec, 2014

1 commit

  • Allow reading of timestamps and cmsg at the same time on all relevant
    socket families. One use is to correlate timestamps with egress
    device, by asking for cmsg IP_PKTINFO.

    on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
    avoid changing legacy expectations, only do so if the caller sets a
    new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

    on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
    returned for all origins. only change is to set ifindex, which is
    not initialized for all error origins.

    In both cases, only generate the pktinfo message if an ifindex is
    known. This is not the case for ACK timestamps.

    The difference between the protocol families is probably a historical
    accident as a result of the different conditions for generating cmsg
    in the relevant ip(v6)_recv_error function:

    ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
    ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

    At one time, this was the same test bar for the ICMP/ICMP6
    distinction. This is no longer true.

    Signed-off-by: Willem de Bruijn

    ----

    Changes
    v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

    The recv cmsg interfaces are also relevant to the discussion of
    whether looping packet headers is problematic. For v6, cmsgs that
    identify many headers are already returned. This patch expands
    that to v4. If it sounds reasonable, I will follow with patches

    1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
    (http://patchwork.ozlabs.org/patch/366967/)
    2. sysctl to conditionally drop all timestamps that have payload or
    cmsg from users without CAP_NET_RAW.
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

12 Nov, 2014

1 commit

  • Use the more common dynamic_debug capable net_dbg_ratelimited
    and remove the LIMIT_NETDEBUG macro.

    All messages are still ratelimited.

    Some KERN_ uses are changed to KERN_DEBUG.

    This may have some negative impact on messages that were
    emitted at KERN_INFO that are not not enabled at all unless
    DEBUG is defined or dynamic_debug is enabled. Even so,
    these messages are now _not_ emitted by default.

    This also eliminates the use of the net_msg_warn sysctl
    "/proc/sys/net/core/warnings". For backward compatibility,
    the sysctl is not removed, but it has no function. The extern
    declaration of net_msg_warn is removed from sock.h and made
    static in net/core/sysctl_net_core.c

    Miscellanea:

    o Update the sysctl documentation
    o Remove the embedded uses of pr_fmt
    o Coalesce format fragments
    o Realign arguments

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches