18 Feb, 2017

9 commits

  • commit 92e55f412cffd016cc245a74278cb4d7b89bb3bc upstream.

    Unlike ipv4, this control socket is shared by all cpus so we cannot use
    it as scratchpad area to annotate the mark that we pass to ip6_xmit().

    Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
    family caches the flowi6 structure in the sctp_transport structure, so
    we cannot use to carry the mark unless we later on reset it back, which
    I discarded since it looks ugly to me.

    Fixes: bf99b4ded5f8 ("tcp: fix mark propagation with fwmark_reflect enabled")
    Suggested-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira
     
  • commit bf99b4ded5f8a4767dbb9d180626f06c51f9881f upstream.

    Otherwise, RST packets generated by the TCP stack for non-existing
    sockets always have mark 0.
    The mark from the original packet is assigned to the netns_ipv4/6
    socket used to send the response so that it can get copied into the
    response skb when the socket sends it.

    Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
    Cc: Lorenzo Colitti
    Signed-off-by: Pau Espin Pedrol
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pau Espin Pedrol
     
  • [ Upstream commit 9c8bb163ae784be4f79ae504e78c862806087c54 ]

    In function igmpv3/mld_add_delrec() we allocate pmc and put it in
    idev->mc_tomb, so we should free it when we don't need it in del_delrec().
    But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.

    Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when ...")
    Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when ...")
    Reported-by: Daniel Borkmann
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit 1666d49e1d416fcc2cce708242a52fe3317ea8ba ]

    This is an IPv6 version of commit 24803f38a5c0 ("igmp: do not remove igmp
    souce list..."). In mld_del_delrec(), we will restore back all source filter
    info instead of flush them.

    Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
    we should not remove source list info when set link down. Remove
    igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
    ipv6_mc_down().

    Also clear all source info after igmp6_group_dropped() instead of in it
    because ipv6_mc_down() will call igmp6_group_dropped().

    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit d7426c69a1942b2b9b709bf66b944ff09f561484 ]

    Dmitry reported a double free in sit_init_net():

    kernel BUG at mm/percpu.c:689!
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 0 PID: 15692 Comm: syz-executor1 Not tainted 4.10.0-rc6-next-20170206 #1
    Hardware name: Google Google Compute Engine/Google Compute Engine,
    BIOS Google 01/01/2011
    task: ffff8801c9cc27c0 task.stack: ffff88017d1d8000
    RIP: 0010:pcpu_free_area+0x68b/0x810 mm/percpu.c:689
    RSP: 0018:ffff88017d1df488 EFLAGS: 00010046
    RAX: 0000000000010000 RBX: 00000000000007c0 RCX: ffffc90002829000
    RDX: 0000000000010000 RSI: ffffffff81940efb RDI: ffff8801db841d94
    RBP: ffff88017d1df590 R08: dffffc0000000000 R09: 1ffffffff0bb3bdd
    R10: dffffc0000000000 R11: 00000000000135dd R12: ffff8801db841d80
    R13: 0000000000038e40 R14: 00000000000007c0 R15: 00000000000007c0
    FS: 00007f6ea608f700(0000) GS:ffff8801dbe00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000002000aff8 CR3: 00000001c8d44000 CR4: 00000000001426f0
    DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
    Call Trace:
    free_percpu+0x212/0x520 mm/percpu.c:1264
    ipip6_dev_free+0x43/0x60 net/ipv6/sit.c:1335
    sit_init_net+0x3cb/0xa10 net/ipv6/sit.c:1831
    ops_init+0x10a/0x530 net/core/net_namespace.c:115
    setup_net+0x2ed/0x690 net/core/net_namespace.c:291
    copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
    create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
    unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
    SYSC_unshare kernel/fork.c:2281 [inline]
    SyS_unshare+0x64e/0xfc0 kernel/fork.c:2231
    entry_SYSCALL_64_fastpath+0x1f/0xc2

    This is because when tunnel->dst_cache init fails, we free dev->tstats
    once in ipip6_tunnel_init() and twice in sit_init_net(). This looks
    redundant but its ndo_uinit() does not seem enough to clean up everything
    here. So avoid this by setting dev->tstats to NULL after the first free,
    at least for -net.

    Reported-by: Dmitry Vyukov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    WANG Cong
     
  • [ Upstream commit ebf6c9cb23d7e56eec8575a88071dec97ad5c6e2 ]

    Dmitry reported use-after-free in ip6_datagram_recv_specific_ctl()

    A similar bug was fixed in commit 8ce48623f0cf ("ipv6: tcp: restore
    IP6CB for pktoptions skbs"), but I missed another spot.

    tcp_v6_syn_recv_sock() can indeed set np->pktoptions from ireq->pktopts

    Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 7892032cfe67f4bde6fc2ee967e45a8fbaf33756 ]

    Andrey Konovalov reported out of bound accesses in ip6gre_err()

    If GRE flags contains GRE_KEY, the following expression
    *(((__be32 *)p) + (grehlen / 4) - 1)

    accesses data ~40 bytes after the expected point, since
    grehlen includes the size of IPv6 headers.

    Let's use a "struct gre_base_hdr *greh" pointer to make this
    code more readable.

    p[1] becomes greh->protocol.
    grhlen is the GRE header length.

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 63117f09c768be05a0bf465911297dc76394f686 ]

    Casting is a high precedence operation but "off" and "i" are in terms of
    bytes so we need to have some parenthesis here.

    Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
    Signed-off-by: Dan Carpenter
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit fbfa743a9d2a0ffa24251764f10afc13eb21e739 ]

    This function suffers from multiple issues.

    First one is that pskb_may_pull() may reallocate skb->head,
    so the 'raw' pointer needs either to be reloaded or not used at all.

    Second issue is that NEXTHDR_DEST handling does not validate
    that the options are present in skb->data, so we might read
    garbage or access non existent memory.

    With help from Willem de Bruijn.

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

04 Feb, 2017

5 commits

  • [ Upstream commit 88ff7334f25909802140e690c0e16433e485b0a0 ]

    Modules implementing lwtunnel ops should not be allowed to unload
    while there is state alive using those ops, so specify the owning
    module for all lwtunnel ops.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Robert Shearman
     
  • [ Upstream commit 03e4deff4987f79c34112c5ba4eb195d4f9382b0 ]

    Just like commit 4acd4945cd1e ("ipv6: addrconf: Avoid calling
    netdevice notifiers with RCU read-side lock"), it is unnecessary
    to make addrconf_disable_change() use RCU iteration over the
    netdev list, since it already holds the RTNL lock, or we may meet
    Illegal context switch in RCU read-side critical section.

    Signed-off-by: Kefeng Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Kefeng Wang
     
  • [ Upstream commit 9ed59592e3e379b2e9557dc1d9e9ec8fcbb33f16]

    Trying to add an mpls encap route when the MPLS modules are not loaded
    hangs. For example:

    CONFIG_MPLS=y
    CONFIG_NET_MPLS_GSO=m
    CONFIG_MPLS_ROUTING=m
    CONFIG_MPLS_IPTUNNEL=m

    $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2

    The ip command hangs:
    root 880 826 0 21:25 pts/0 00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2

    $ cat /proc/880/stack
    [] call_usermodehelper_exec+0xd6/0x134
    [] __request_module+0x27b/0x30a
    [] lwtunnel_build_state+0xe4/0x178
    [] fib_create_info+0x47f/0xdd4
    [] fib_table_insert+0x90/0x41f
    [] inet_rtm_newroute+0x4b/0x52
    ...

    modprobe is trying to load rtnl-lwt-MPLS:

    root 881 5 0 21:25 ? 00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS

    and it hangs after loading mpls_router:

    $ cat /proc/881/stack
    [] rtnl_lock+0x12/0x14
    [] register_netdevice_notifier+0x16/0x179
    [] mpls_init+0x25/0x1000 [mpls_router]
    [] do_one_initcall+0x8e/0x13f
    [] do_init_module+0x5a/0x1e5
    [] load_module+0x13bd/0x17d6
    ...

    The problem is that lwtunnel_build_state is called with rtnl lock
    held preventing mpls_init from registering.

    Given the potential references held by the time lwtunnel_build_state it
    can not drop the rtnl lock to the load module. So, extract the module
    loading code from lwtunnel_build_state into a new function to validate
    the encap type. The new function is called while converting the user
    request into a fib_config which is well before any table, device or
    fib entries are examined.

    Fixes: 745041e2aaf1 ("lwtunnel: autoload of lwt modules")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 02ca0423fd65a0a9c4d70da0dbb8f4b8503f08c7 ]

    With ip6gre we have a tunnel header which also makes the tunnel MTU
    smaller. We need to reserve room for it. Previously we were using up
    space reserved for the Tunnel Encapsulation Limit option
    header (RFC 2473).

    Also, after commit b05229f44228 ("gre6: Cleanup GREv6 transmit path,
    call common GRE functions") our contract with the caller has
    changed. Now we check if the packet length exceeds the tunnel MTU after
    the tunnel header has been pushed, unlike before.

    This is reflected in the check where we look at the packet length minus
    the size of the tunnel header, which is already accounted for in tunnel
    MTU.

    Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
    Signed-off-by: Jakub Sitnicki
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Sitnicki
     
  • [ Upstream commit ea7a80858f57d8878b1499ea0f1b8a635cc48de7 ]

    Handle failure in lwtunnel_fill_encap adding attributes to skb.

    Fixes: 571e722676fe ("ipv4: support for fib route lwtunnel encap attributes")
    Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

15 Jan, 2017

3 commits

  • [ Upstream commit 57ea52a865144aedbcd619ee0081155e658b6f7d ]

    The GRO fast path caches the frag0 address. This address becomes
    invalid if frag0 is modified by pskb_may_pull or its variants.
    So whenever that happens we must disable the frag0 optimization.

    This is usually done through the combination of gro_header_hard
    and gro_header_slow, however, the IPv6 extension header path did
    the pulling directly and would continue to use the GRO fast path
    incorrectly.

    This patch fixes it by disabling the fast path when we enter the
    IPv6 extension header path.

    Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
    Reported-by: Slava Shwartsman
    Signed-off-by: Herbert Xu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Herbert Xu
     
  • [ Upstream commit a98f91758995cb59611e61318dddd8a6956b52c3 ]

    By setting certain socket options on ipv6 raw sockets, we can confuse the
    length calculation in rawv6_push_pending_frames triggering a BUG_ON.

    RIP: 0010:[] [] rawv6_sendmsg+0xc30/0xc40
    RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282
    RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
    RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
    RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
    R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
    R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

    Call Trace:
    [] ? unmap_page_range+0x693/0x830
    [] inet_sendmsg+0x67/0xa0
    [] sock_sendmsg+0x38/0x50
    [] SYSC_sendto+0xef/0x170
    [] SyS_sendto+0xe/0x10
    [] do_syscall_64+0x50/0xa0
    [] entry_SYSCALL64_slow_path+0x25/0x25

    Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

    Reproducer:

    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define LEN 504

    int main(int argc, char* argv[])
    {
    int fd;
    int zero = 0;
    char buf[LEN];

    memset(buf, 0, LEN);

    fd = socket(AF_INET6, SOCK_RAW, 7);

    setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
    setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

    sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
    }

    Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dave Jones
     
  • [ Upstream commit 39b2dd765e0711e1efd1d1df089473a8dd93ad48 ]

    Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
    the packet. For sockets that have transport headers pulled, transport
    offset can be negative. Use signed comparison to avoid overflow.

    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Reported-by: Nisar Jagabar
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

03 Dec, 2016

3 commits

  • segs needs to be checked for being NULL in ipv6_gso_segment() before calling
    skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:

    [ 97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
    [ 97.819112] IP: [] ipv6_gso_segment+0x119/0x2f0
    [ 97.825214] PGD 0 [ 97.827047]
    [ 97.828540] Oops: 0000 [#1] SMP
    [ 97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5
    nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
    iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
    ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
    bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel
    snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device
    snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport
    sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
    ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
    broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
    ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core
    i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod
    [ 97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1
    [ 97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010
    [ 97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000
    [ 97.947720] RIP: 0010:[] [] ipv6_gso_segment+0x119/0x2f0
    [ 97.956251] RSP: 0018:ffff88012fc43a10 EFLAGS: 00010207
    [ 97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594
    [ 97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000
    [ 97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000
    [ 97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3
    [ 97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000
    [ 97.997198] FS: 0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000
    [ 98.005280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0
    [ 98.018149] Stack:
    [ 98.020157] 00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e
    [ 98.027584] 0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3
    [ 98.035010] ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98
    [ 98.042437] Call Trace:
    [ 98.044879] [ 98.046803] [] ? tg3_start_xmit+0x84a/0xd60 [tg3]
    [ 98.053156] [] skb_mac_gso_segment+0xb0/0x130
    [ 98.059158] [] __skb_gso_segment+0x73/0x110
    [ 98.064985] [] validate_xmit_skb+0x12d/0x2b0
    [ 98.070899] [] validate_xmit_skb_list+0x42/0x70
    [ 98.077073] [] sch_direct_xmit+0xd0/0x1b0
    [ 98.082726] [] __dev_queue_xmit+0x486/0x690
    [ 98.088554] [] ? cpumask_next_and+0x35/0x50
    [ 98.094380] [] dev_queue_xmit+0x10/0x20
    [ 98.099863] [] br_dev_queue_push_xmit+0xa7/0x170 [bridge]
    [ 98.106907] [] br_forward_finish+0x41/0xc0 [bridge]
    [ 98.113430] [] ? nf_iterate+0x52/0x60
    [ 98.118735] [] ? nf_hook_slow+0x6b/0xc0
    [ 98.124216] [] __br_forward+0x14c/0x1e0 [bridge]
    [ 98.130480] [] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
    [ 98.137785] [] br_forward+0x9d/0xb0 [bridge]
    [ 98.143701] [] br_handle_frame_finish+0x267/0x560 [bridge]
    [ 98.150834] [] br_handle_frame+0x174/0x2f0 [bridge]
    [ 98.157355] [] ? sched_clock+0x9/0x10
    [ 98.162662] [] ? sched_clock_cpu+0x72/0xa0
    [ 98.168403] [] __netif_receive_skb_core+0x1e5/0xa20
    [ 98.174926] [] ? timerqueue_add+0x59/0xb0
    [ 98.180580] [] __netif_receive_skb+0x18/0x60
    [ 98.186494] [] process_backlog+0x95/0x140
    [ 98.192145] [] net_rx_action+0x16d/0x380
    [ 98.197713] [] __do_softirq+0xd1/0x283
    [ 98.203106] [] do_softirq_own_stack+0x1c/0x30
    [ 98.209107] [ 98.211029] [] do_softirq+0x50/0x60
    [ 98.216166] [] netif_rx_ni+0x33/0x80
    [ 98.221386] [] tun_get_user+0x487/0x7f0 [tun]
    [ 98.227388] [] tun_sendmsg+0x4b/0x60 [tun]
    [ 98.233129] [] handle_tx+0x282/0x540 [vhost_net]
    [ 98.239392] [] handle_tx_kick+0x15/0x20 [vhost_net]
    [ 98.245916] [] vhost_worker+0x9e/0xf0 [vhost]
    [ 98.251919] [] ? vhost_umem_alloc+0x40/0x40 [vhost]
    [ 98.258440] [] ? do_syscall_64+0x67/0x180
    [ 98.264094] [] kthread+0xd9/0xf0
    [ 98.268965] [] ? kthread_park+0x60/0x60
    [ 98.274444] [] ret_from_fork+0x25/0x30
    [ 98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66
    [ 98.299425] RIP [] ipv6_gso_segment+0x119/0x2f0
    [ 98.305612] RSP
    [ 98.309094] CR2: 00000000000000cc
    [ 98.312406] ---[ end trace 726a2c7a2d2d78d0 ]---

    Signed-off-by: Artem Savkov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Artem Savkov
     
  • This reverts commit ae148b085876fa771d9ef2c05f85d4b4bf09ce0d
    ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").

    skb->protocol is now set in __ip_local_out() and __ip6_local_out() before
    dst_output() is called. It is no longer necessary to do it for each tunnel.

    Cc: stable@vger.kernel.org
    Signed-off-by: Eli Cooper
    Signed-off-by: David S. Miller

    Eli Cooper
     
  • When xfrm is applied to TSO/GSO packets, it follows this path:

    xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

    where skb_gso_segment() relies on skb->protocol to function properly.

    This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
    fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
    when xfrm is involved.

    Cc: stable@vger.kernel.org
    Signed-off-by: Eli Cooper
    Signed-off-by: David S. Miller

    Eli Cooper
     

02 Dec, 2016

2 commits

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2016-12-01

    1) Change the error value when someone tries to run 32bit
    userspace on a 64bit host from -ENOTSUPP to the userspace
    exported -EOPNOTSUPP. Fix from Yi Zhao.

    2) On inbound, ESN sequence numbers are already in network
    byte order. So don't try to convert it again, this fixes
    integrity verification for ESN. Fixes from Tobias Brunner.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    This is a large batch of Netfilter fixes for net, they are:

    1) Three patches to fix NAT conversion to rhashtable: Switch to rhlist
    structure that allows to have several objects with the same key.
    Moreover, fix wrong comparison logic in nf_nat_bysource_cmp() as this is
    expecting a return value similar to memcmp(). Change location of
    the nat_bysource field in the nf_conn structure to avoid zeroing
    this as it breaks interaction with SLAB_DESTROY_BY_RCU and lead us
    to crashes. From Florian Westphal.

    2) Don't allow malformed fragments go through in IPv6, drop them,
    otherwise we hit GPF, patch from Florian Westphal.

    3) Fix crash if attributes are missing in nft_range, from Liping Zhang.

    4) Fix arptables 32-bits userspace 64-bits kernel compat, from Hongxu Jia.

    5) Two patches from David Ahern to fix netfilter interaction with vrf.
    From David Ahern.

    6) Fix element timeout calculation in nf_tables, we take milliseconds
    from userspace, but we use jiffies from kernelspace. Patch from
    Anders K. Pedersen.

    7) Missing validation length netlink attribute for nft_hash, from
    Laura Garcia.

    8) Fix nf_conntrack_helper documentation, we don't default to off
    anymore for a bit of time so let's get this in sync with the code.

    I know is late but I think these are important, specifically the NAT
    bits, as they are mostly addressing fallout from recent changes. I also
    read there are chances to have -rc8, if that is the case, that would
    also give us a bit more time to test this.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Dec, 2016

1 commit


30 Nov, 2016

2 commits

  • When handling inbound packets, the two halves of the sequence number
    stored on the skb are already in network order.

    Fixes: 000ae7b2690e ("esp6: Switch to new AEAD interface")
    Signed-off-by: Tobias Brunner
    Acked-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Tobias Brunner
     
  • Dmitry Vyukov reported GPF in network stack that Andrey traced down to
    negative nh offset in nf_ct_frag6_queue().

    Problem is that all network headers before fragment header are pulled.
    Normal ipv6 reassembly will drop the skb when errors occur further down
    the line.

    netfilter doesn't do this, and instead passed the original fragment
    along. That was also fine back when netfilter ipv6 defrag worked with
    cloned fragments, as the original, pristine fragment was passed on.

    So we either have to undo the pull op, or discard such fragments.
    Since they're malformed after all (e.g. overlapping fragment) it seems
    preferrable to just drop them.

    Same for temporary errors -- it doesn't make sense to accept (and
    perhaps forward!) only some fragments of same datagram.

    Fixes: 029f7f3b8701cc7ac ("netfilter: ipv6: nf_defrag: avoid/free clone operations")
    Reported-by: Dmitry Vyukov
    Debugged-by: Andrey Konovalov
    Diagnosed-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

29 Nov, 2016

1 commit

  • Andrey reported the following while fuzzing the kernel with syzkaller:

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    Modules linked in:
    CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff8800666d4200 task.stack: ffff880067348000
    RIP: 0010:[] []
    icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
    RSP: 0018:ffff88006734f2c0 EFLAGS: 00010206
    RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
    RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
    R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
    R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
    FS: 00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
    Stack:
    ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
    ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
    ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
    Call Trace:
    [] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
    [< inline >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
    [] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
    [] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
    [] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
    ...

    icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
    cases the dst->dev should be preferred for determining the L3 domain
    if the dst has been set on the skb. Fallback to the skb->dev if it has
    not. This covers the case reported here where icmp6_send is invoked on
    Rx before the route lookup.

    Fixes: 5d41ce29e ("net: icmp6_send should use dst dev to determine L3 domain")
    Reported-by: Andrey Konovalov
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

28 Nov, 2016

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2016-11-25

    1) Fix a refcount leak in vti6.
    From Nicolas Dichtel.

    2) Fix a wrong if statement in xfrm_sk_policy_lookup.
    From Florian Westphal.

    3) The flowcache watermarks are per cpu. Take this into
    account when comparing to the threshold where we
    refusing new allocations. From Miroslav Urbanek.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Nov, 2016

2 commits

  • In commits 93821778def10 ("udp: Fix rcv socket locking") and
    f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
    __udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
    was forgotten.

    This leads to crashes if UDPlite header is pulled twice, which happens
    starting from commit e6afc8ace6dd ("udp: remove headers from UDP packets
    before queueing")

    Bug found by syzkaller team, thanks a lot guys !

    Note that backlog use in UDP/UDPlite is scheduled to be removed starting
    from linux-4.10, so this patch is only needed up to linux-4.9

    Fixes: 93821778def1 ("udp: Fix rcv socket locking")
    Fixes: f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Cc: Benjamin LaHaise
    Cc: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When an ipv6 address has the tentative flag set, it can't be
    used as source for egress traffic, while the associated route,
    if any, can be looked up and even stored into some dst_cache.

    In the latter scenario, the source ipv6 address selected and
    stored in the cache is most probably wrong (e.g. with
    link-local scope) and the entity using the dst_cache will
    experience lack of ipv6 connectivity until said cache is
    cleared or invalidated.

    Overall this may cause lack of connectivity over most IPv6 tunnels
    (comprising geneve and vxlan), if the first egress packet reaches
    the tunnel before the DaD is completed for the used ipv6
    address.

    This patch bumps a new genid after that the IFA_F_TENTATIVE flag
    is cleared, so that dst_cache will be invalidated on
    next lookup and ipv6 connectivity restored.

    Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device")
    Fixes: 468dfffcd762 ("geneve: add dst caching support")
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

24 Nov, 2016

1 commit

  • nf_send_reset6 is not considering the L3 domain and lookups are sent
    to the wrong table. For example consider the following output rule:

    ip6tables -A OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset

    using perf to analyze lookups via the fib6_table_lookup tracepoint shows:

    swapper 0 [001] 248.787816: fib6:fib6_table_lookup: table 255 oif 0 iif 1 src 2100:1::3 dst 2100:1:
    ffffffff81439cdc perf_trace_fib6_table_lookup ([kernel.kallsyms])
    ffffffff814c1ce3 trace_fib6_table_lookup ([kernel.kallsyms])
    ffffffff814c3e89 ip6_pol_route ([kernel.kallsyms])
    ffffffff814c40d5 ip6_pol_route_output ([kernel.kallsyms])
    ffffffff814e7b6f fib6_rule_action ([kernel.kallsyms])
    ffffffff81437f60 fib_rules_lookup ([kernel.kallsyms])
    ffffffff814e7c79 fib6_rule_lookup ([kernel.kallsyms])
    ffffffff814c2541 ip6_route_output_flags ([kernel.kallsyms])
    528 nf_send_reset6 ([nf_reject_ipv6])

    The lookup is directed to table 255 rather than the table associated with
    the device via the L3 domain. Update nf_send_reset6 to pull the L3 domain
    from the dst currently attached to the skb.

    Signed-off-by: David Ahern
    Signed-off-by: Pablo Neira Ayuso

    David Ahern
     

18 Nov, 2016

1 commit

  • If an ip6 tunnel is configured to inherit the traffic class from
    the inner header, the dst_cache must be disabled or it will foul
    the policy routing.

    The issue is apprently there since at leat Linux-2.6.12-rc2.

    Reported-by: Liam McBirnie
    Cc: Liam McBirnie
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

16 Nov, 2016

1 commit

  • Honor udptable parameter that is passed to __udp*_lib_mcast_deliver(),
    otherwise udplite broadcast/multicast use the wrong table and it breaks.

    Fixes: 2dc41cff7545 ("udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.")
    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pablo Neira
     

14 Nov, 2016

1 commit

  • With syzkaller help, Marco Grassi found a bug in TCP stack,
    crashing in tcp_collapse()

    Root cause is that sk_filter() can truncate the incoming skb,
    but TCP stack was not really expecting this to happen.
    It probably was expecting a simple DROP or ACCEPT behavior.

    We first need to make sure no part of TCP header could be removed.
    Then we need to adjust TCP_SKB_CB(skb)->end_seq

    Many thanks to syzkaller team and Marco for giving us a reproducer.

    Signed-off-by: Eric Dumazet
    Reported-by: Marco Grassi
    Reported-by: Vladis Dronov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Nov, 2016

3 commits

  • Lorenzo noted an Android unit test failed due to e0d56fdd7342:
    "The expectation in the test was that the RST replying to a SYN sent to a
    closed port should be generated with oif=0. In other words it should not
    prefer the interface where the SYN came in on, but instead should follow
    whatever the routing table says it should do."

    Revert the change to ip_send_unicast_reply and tcp_v6_send_response such
    that the oif in the flow is set to the skb_iif only if skb_iif is an L3
    master.

    Fixes: e0d56fdd7342 ("net: l3mdev: remove redundant calls")
    Reported-by: Lorenzo Colitti
    Signed-off-by: David Ahern
    Tested-by: Lorenzo Colitti
    Acked-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    David Ahern
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains a larger than usual batch of Netfilter
    fixes for your net tree. This series contains a mixture of old bugs and
    recently introduced bugs, they are:

    1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't
    support the set element updates from the packet path. From Liping
    Zhang.

    2) Fix leak when nft_expr_clone() fails, from Liping Zhang.

    3) Fix a race when inserting new elements to the set hash from the
    packet path, also from Liping.

    4) Handle segmented TCP SIP packets properly, basically avoid that the
    INVITE in the allow header create bogus expectations by performing
    stricter SIP message parsing, from Ulrich Weber.

    5) nft_parse_u32_check() should return signed integer for errors, from
    John Linville.

    6) Fix wrong allocation instead of connlabels, allocate 16 instead of
    32 bytes, from Florian Westphal.

    7) Fix compilation breakage when building the ip_vs_sync code with
    CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann.

    8) Destroy the new set if the transaction object cannot be allocated,
    also from Liping Zhang.

    9) Use device to route duplicated packets via nft_dup only when set by
    the user, otherwise packets may not follow the right route, again
    from Liping.

    10) Fix wrong maximum genetlink attribute definition in IPVS, from
    WANG Cong.

    11) Ignore untracked conntrack objects from xt_connmark, from Florian
    Westphal.

    12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC
    via CT target, otherwise we cannot use the h.245 helper, from
    Florian.

    13) Revisit garbage collection heuristic in the new workqueue-based
    timer approach for conntrack to evict objects earlier, again from
    Florian.

    14) Fix crash in nf_tables when inserting an element into a verdict map,
    from Liping Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Routes can specify an mtu explicitly or inherit the mtu from
    the underlying device - this inheritance is implemented in
    dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().

    Currently changing the mtu of a device adds mtu explicitly
    to routes using that device.

    ie.
    # ip link set dev lo mtu 65536
    # ip -6 route add local 2000::1 dev lo
    # ip -6 route get 2000::1
    local 2000::1 dev lo table local src ... metric 1024 pref medium

    # ip link set dev lo mtu 65535
    # ip -6 route get 2000::1
    local 2000::1 dev lo table local src ... metric 1024 mtu 65535 pref medium

    # ip link set dev lo mtu 65536
    # ip -6 route get 2000::1
    local 2000::1 dev lo table local src ... metric 1024 mtu 65536 pref medium

    # ip -6 route del local 2000::1

    After this patch the route entry no longer changes unless it already has an mtu.
    There is no need: this inheritance is already done in ip6_mtu()

    # ip link set dev lo mtu 65536
    # ip -6 route add local 2000::1 dev lo
    # ip -6 route add local 2000::2 dev lo mtu 2000
    # ip -6 route get 2000::1; ip -6 route get 2000::2
    local 2000::1 dev lo table local src ... metric 1024 pref medium
    local 2000::2 dev lo table local src ... metric 1024 mtu 2000 pref medium

    # ip link set dev lo mtu 65535
    # ip -6 route get 2000::1; ip -6 route get 2000::2
    local 2000::1 dev lo table local src ... metric 1024 pref medium
    local 2000::2 dev lo table local src ... metric 1024 mtu 2000 pref medium

    # ip link set dev lo mtu 1501
    # ip -6 route get 2000::1; ip -6 route get 2000::2
    local 2000::1 dev lo table local src ... metric 1024 pref medium
    local 2000::2 dev lo table local src ... metric 1024 mtu 1501 pref medium

    # ip link set dev lo mtu 65536
    # ip -6 route get 2000::1; ip -6 route get 2000::2
    local 2000::1 dev lo table local src ... metric 1024 pref medium
    local 2000::2 dev lo table local src ... metric 1024 mtu 65536 pref medium

    # ip -6 route del local 2000::1
    # ip -6 route del local 2000::2

    This is desirable because changing device mtu and then resetting it
    to the previous value shouldn't change the user visible routing table.

    Signed-off-by: Maciej Żenczykowski
    CC: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     

08 Nov, 2016

1 commit

  • icmp6_send is called in response to some event. The skb may not have
    the device set (skb->dev is NULL), but it is expected to have a dst set.
    Update icmp6_send to use the dst on the skb to determine L3 domain.

    Fixes: ca254490c8dfd ("net: Add VRF support to IPv6 stack")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

03 Nov, 2016

1 commit

  • Some IPCB fields are currently set in udp_tunnel6_xmit_skb(), which are
    never used before it reaches ip6tunnel_xmit(), and past that point the
    control buffer is no longer interpreted as IPCB.

    This clears these unused IPCB related codes. Currently there is no skb
    scrubbing in ip6_udp_tunnel, otherwise IPCB(skb)->opt might need to be
    cleared for IPv4 packets, as shown in 5146d1f1511
    ("tunnel: Clear IPCB(skb)->opt before dst_link_failure called").

    Signed-off-by: Eli Cooper
    Signed-off-by: David S. Miller

    Eli Cooper
     

01 Nov, 2016

2 commits

  • Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu.
    It leaded to that mtu lock doesn't really work when receiving the pkt
    of ICMPV6_PKT_TOOBIG.

    This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4
    did in __ip_rt_update_pmtu.

    Acked-by: Hannes Frederic Sowa
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • Similar to commit c146066ab802 ("ipv4: Don't use ufo handling on later
    transformed packets"), don't perform UFO on packets that will be IPsec
    transformed. To detect it we rely on the fact that headerlen in
    dst_entry is non-zero only for transformation bundles (xfrm_dst
    objects).

    Unwanted segmentation can be observed with a NETIF_F_UFO capable device,
    such as a dummy device:

    DEV=dum0 LEN=1493

    ip li add $DEV type dummy
    ip addr add fc00::1/64 dev $DEV nodad
    ip link set $DEV up
    ip xfrm policy add dir out src fc00::1 dst fc00::2 \
    tmpl src fc00::1 dst fc00::2 proto esp spi 1
    ip xfrm state add src fc00::1 dst fc00::2 \
    proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b

    tcpdump -n -nn -i $DEV -t &
    socat /dev/zero,readbytes=$LEN udp6:[fc00::2]:$LEN

    tcpdump output before:

    IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
    IP6 fc00::1 > fc00::2: frag (1448|48)
    IP6 fc00::1 > fc00::2: ESP(spi=0x00000001,seq=0x2), length 88

    ... and after:

    IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
    IP6 fc00::1 > fc00::2: frag (1448|80)

    Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")

    Signed-off-by: Jakub Sitnicki
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Jakub Sitnicki