18 Oct, 2018

4 commits

  • [ Upstream commit 474ff2600889e16280dbc6ada8bfecb216169a70 ]

    So it should not fail with EPERM even though it is no longer implemented...

    This is a fix for:
    (userns)$ egrep ^Cap /proc/self/status
    CapInh: 0000003fffffffff
    CapPrm: 0000003fffffffff
    CapEff: 0000003fffffffff
    CapBnd: 0000003fffffffff
    CapAmb: 0000003fffffffff

    (userns)$ tcpdump -i usb_rndis0
    tcpdump: WARNING: usb_rndis0: SIOCETHTOOL(ETHTOOL_GUFO) ioctl failed: Operation not permitted
    Warning: Kernel filter failed: Bad file descriptor
    tcpdump: can't remove kernel filter: Bad file descriptor

    With this change it returns EOPNOTSUPP instead of EPERM.

    See also https://github.com/the-tcpdump-group/libpcap/issues/689

    Fixes: 08a00fea6de2 "net: Remove references to NETIF_F_UFO from ethtool."
    Cc: David S. Miller
    Signed-off-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Maciej Żenczykowski
     
  • [ Upstream commit 0e1d6eca5113858ed2caea61a5adc03c595f6096 ]

    We have an impressive number of syzkaller bugs that are linked
    to the fact that syzbot was able to create a networking device
    with millions of TX (or RX) queues.

    Let's limit the number of RX/TX queues to 4096, this really should
    cover all known cases.

    A separate patch will add various cond_resched() in the loops
    handling sysfs entries at device creation and dismantle.

    Tested:

    lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap
    RTNETLINK answers: Invalid argument

    lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap

    real 0m0.180s
    user 0m0.000s
    sys 0m0.107s

    Fixes: 76ff5cc91935 ("rtnl: allow to specify number of rx and tx queues on device creation")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit bd961c9bc66497f0c63f4ba1d02900bb85078366 ]

    Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
    which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').

    The problem is, the function bails out early if nlmsg_parse() fails, which
    does occur for iproute2 usage of 'struct ndmsg' because the payload length
    is shorter than the family header alone (as 'struct ifinfomsg' is assumed).

    This breaks backward compatibility with userspace -- nothing is sent back.

    Some examples with iproute2 and netlink library for go [1]:

    1) $ bridge fdb show
    33:33:00:00:00:01 dev ens3 self permanent
    01:00:5e:00:00:01 dev ens3 self permanent
    33:33:ff:15:98:30 dev ens3 self permanent

    This one works, as it uses 'struct ifinfomsg'.

    fdb_show() @ iproute2/bridge/fdb.c
    """
    .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
    ...
    if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...]
    """

    2) $ ip --family bridge neigh
    RTNETLINK answers: Invalid argument
    Dump terminated

    This one fails, as it uses 'struct ndmsg'.

    do_show_or_flush() @ iproute2/ip/ipneigh.c
    """
    .n.nlmsg_type = RTM_GETNEIGH,
    .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
    """

    3) $ ./neighlist
    < no output >

    This one fails, as it uses 'struct ndmsg'-based.

    neighList() @ netlink/neigh_linux.go
    """
    req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...]
    msg := Ndmsg{
    """

    The actual breakage was introduced by commit 0ff50e83b512 ("net: rtnetlink:
    bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
    if the payload length (with the _actual_ family header) is less than the
    family header length alone (which is assumed, in parameter 'hdrlen').
    This is true in the examples above with struct ndmsg, with size and payload
    length shorter than struct ifinfomsg.

    However, that commit just intends to fix something under the assumption the
    family header is indeed an 'struct ifinfomsg' - by preventing access to the
    payload as such (via 'ifm' pointer) if the payload length is not sufficient
    to actually contain it.

    The assumption was introduced by commit 5e6d24358799 ("bridge: netlink dump
    interface at par with brctl"), to support iproute2's 'bridge fdb' command
    (not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.

    So, in order to unbreak the 'struct ndmsg' family headers and still allow
    'struct ifinfomsg' to continue to work, check for the known message sizes
    used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
    not used in this function anyway) then do not parse the data as ifinfomsg.

    Same examples with this patch applied (or revert/before the original fix):

    $ bridge fdb show
    33:33:00:00:00:01 dev ens3 self permanent
    01:00:5e:00:00:01 dev ens3 self permanent
    33:33:ff:15:98:30 dev ens3 self permanent

    $ ip --family bridge neigh
    dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
    dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
    dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT

    $ ./neighlist
    netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
    netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
    netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}

    Tested on mainline (v4.19-rc6) and net-next (3bd09b05b068).

    References:

    [1] netlink library for go (test-case)
    https://github.com/vishvananda/netlink

    $ cat ~/go/src/neighlist/main.go
    package main
    import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
    func main() {
    neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
    for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
    }

    $ export GOPATH=~/go
    $ go get github.com/vishvananda/netlink
    $ go build neighlist
    $ ~/go/src/neighlist/neighlist

    Thanks to David Ahern for suggestions to improve this patch.

    Fixes: 0ff50e83b512 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error")
    Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl")
    Reported-by: Aidan Obley
    Signed-off-by: Mauricio Faria de Oliveira
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mauricio Faria de Oliveira
     
  • [ Upstream commit af7d6cce53694a88d6a1bb60c9a239a6a5144459 ]

    Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop
    exceptions"), exceptions get deprecated separately from cached
    routes. In particular, administrative changes don't clear PMTU anymore.

    As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes
    on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
    the local MTU change can become stale:
    - if the local MTU is now lower than the PMTU, that PMTU is now
    incorrect
    - if the local MTU was the lowest value in the path, and is increased,
    we might discover a higher PMTU

    Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those
    cases.

    If the exception was locked, the discovered PMTU was smaller than the
    minimal accepted PMTU. In that case, if the new local MTU is smaller
    than the current PMTU, let PMTU discovery figure out if locking of the
    exception is still needed.

    To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
    notifier. By the time the notifier is called, dev->mtu has been
    changed. This patch adds the old MTU as additional information in the
    notifier structure, and a new call_netdevice_notifiers_u32() function.

    Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     

29 Sep, 2018

1 commit

  • [ Upstream commit f0e0d04413fcce9bc76388839099aee93cd0d33b ]

    Update 'confirmed' timestamp when ARP packet is received. It shouldn't
    affect locktime logic and anyway entry can be confirmed by any higher-layer
    protocol. Thus it makes sense to confirm it when ARP packet is received.

    Fixes: 77d7123342dc ("neighbour: update neigh timestamps iff update is effective")
    Signed-off-by: Vasily Khoruzhick
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vasily Khoruzhick
     

26 Sep, 2018

1 commit

  • [ Upstream commit 5cf4a8532c992bb22a9ecd5f6d93f873f4eaccc2 ]

    According to the documentation in msg_zerocopy.rst, the SO_ZEROCOPY
    flag was introduced because send(2) ignores unknown message flags and
    any legacy application which was accidentally passing the equivalent of
    MSG_ZEROCOPY earlier should not see any new behaviour.

    Before commit f214f915e7db ("tcp: enable MSG_ZEROCOPY"), a send(2) call
    which passed the equivalent of MSG_ZEROCOPY without setting SO_ZEROCOPY
    would succeed. However, after that commit, it fails with -ENOBUFS. So
    it appears that the SO_ZEROCOPY flag fails to fulfill its intended
    purpose. Fix it.

    Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
    Signed-off-by: Vincent Whitchurch
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vincent Whitchurch
     

20 Sep, 2018

3 commits

  • After working on IP defragmentation lately, I found that some large
    packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
    zero paddings on the last (small) fragment.

    While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
    to CHECKSUM_NONE, forcing a full csum validation, even if all prior
    fragments had CHECKSUM_COMPLETE set.

    We can instead compute the checksum of the part we are trimming,
    usually smaller than the part we keep.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 88078d98d1bb085d72af8437707279e203524fa5)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • Tested: see the next patch is the series.

    Suggested-by: Eric Dumazet
    Signed-off-by: Peter Oskolkov
    Signed-off-by: Eric Dumazet
    Cc: Florian Westphal
    Signed-off-by: David S. Miller
    (cherry picked from commit 385114dec8a49b5e5945e77ba7de6356106713f4)
    Signed-off-by: Greg Kroah-Hartman

    Peter Oskolkov
     
  • As measured in my prior patch ("sch_netem: faster rb tree removal"),
    rbtree_postorder_for_each_entry_safe() is nice looking but much slower
    than using rb_next() directly, except when tree is small enough
    to fit in CPU caches (then the cost is the same)

    Also note that there is not even an increase of text size :
    $ size net/core/skbuff.o.before net/core/skbuff.o
    text data bss dec hex filename
    40711 1298 0 42009 a419 net/core/skbuff.o.before
    40711 1298 0 42009 a419 net/core/skbuff.o

    From: Eric Dumazet

    Signed-off-by: David S. Miller
    (cherry picked from commit 7c90584c66cc4b033a3b684b0e0950f79e7b7166)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

05 Sep, 2018

1 commit

  • [ Upstream commit 71eb5255f55bdb484d35ff7c9a1803f453dfbf82 ]

    bpf_parse_prog() is protected by rcu_read_lock().
    so that GFP_KERNEL is not allowed in the bpf_parse_prog().

    [51015.579396] =============================
    [51015.579418] WARNING: suspicious RCU usage
    [51015.579444] 4.18.0-rc6+ #208 Not tainted
    [51015.579464] -----------------------------
    [51015.579488] ./include/linux/rcupdate.h:303 Illegal context switch in RCU read-side critical section!
    [51015.579510] other info that might help us debug this:
    [51015.579532] rcu_scheduler_active = 2, debug_locks = 1
    [51015.579556] 2 locks held by ip/1861:
    [51015.579577] #0: 00000000a8c12fd1 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x2e0/0x910
    [51015.579711] #1: 00000000bf815f8e (rcu_read_lock){....}, at: lwtunnel_build_state+0x96/0x390
    [51015.579842] stack backtrace:
    [51015.579869] CPU: 0 PID: 1861 Comm: ip Not tainted 4.18.0-rc6+ #208
    [51015.579891] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [51015.579911] Call Trace:
    [51015.579950] dump_stack+0x74/0xbb
    [51015.580000] ___might_sleep+0x16b/0x3a0
    [51015.580047] __kmalloc_track_caller+0x220/0x380
    [51015.580077] kmemdup+0x1c/0x40
    [51015.580077] bpf_parse_prog+0x10e/0x230
    [51015.580164] ? kasan_kmalloc+0xa0/0xd0
    [51015.580164] ? bpf_destroy_state+0x30/0x30
    [51015.580164] ? bpf_build_state+0xe2/0x3e0
    [51015.580164] bpf_build_state+0x1bb/0x3e0
    [51015.580164] ? bpf_parse_prog+0x230/0x230
    [51015.580164] ? lock_is_held_type+0x123/0x1a0
    [51015.580164] lwtunnel_build_state+0x1aa/0x390
    [51015.580164] fib_create_info+0x1579/0x33d0
    [51015.580164] ? sched_clock_local+0xe2/0x150
    [51015.580164] ? fib_info_update_nh_saddr+0x1f0/0x1f0
    [51015.580164] ? sched_clock_local+0xe2/0x150
    [51015.580164] fib_table_insert+0x201/0x1990
    [51015.580164] ? lock_downgrade+0x610/0x610
    [51015.580164] ? fib_table_lookup+0x1920/0x1920
    [51015.580164] ? lwtunnel_valid_encap_type.part.6+0xcb/0x3a0
    [51015.580164] ? rtm_to_fib_config+0x637/0xbd0
    [51015.580164] inet_rtm_newroute+0xed/0x1b0
    [51015.580164] ? rtm_to_fib_config+0xbd0/0xbd0
    [51015.580164] rtnetlink_rcv_msg+0x331/0x910
    [ ... ]

    Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

24 Aug, 2018

1 commit

  • [ Upstream commit 7892bd081045222b9e4027fec279a28d6fe7aa66 ]

    if dev_get_valid_name failed, propagate its return code

    and remove the setting err to ENODEV, it will be set to
    0 again before dev_change_net_namespace exits.

    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Li RongQing
     

28 Jul, 2018

2 commits

  • [ Upstream commit 5025f7f7d506fba9b39e7fe8ca10f6f34cb9bc2d ]

    rtnl_configure_link sets dev->rtnl_link_state to
    RTNL_LINK_INITIALIZED and unconditionally calls
    __dev_notify_flags to notify user-space of dev flags.

    current call sequence for rtnl_configure_link
    rtnetlink_newlink
    rtnl_link_ops->newlink
    rtnl_configure_link (unconditionally notifies userspace of
    default and new dev flags)

    If a newlink handler wants to call rtnl_configure_link
    early, we will end up with duplicate notifications to
    user-space.

    This patch fixes rtnl_configure_link to check rtnl_link_state
    and call __dev_notify_flags with gchanges = 0 if already
    RTNL_LINK_INITIALIZED.

    Later in the series, this patch will help the following sequence
    where a driver implementing newlink can call rtnl_configure_link
    to initialize the link early.

    makes the following call sequence work:
    rtnetlink_newlink
    rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
    link and notifies
    user-space of default
    dev flags)
    rtnl_configure_link (updates dev flags if requested by user ifm
    and notifies user-space of new dev flags)

    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Roopa Prabhu
     
  • [ Upstream commit ff907a11a0d68a749ce1a321f4505c03bf72190c ]

    syzbot caught a NULL deref [1], caused by skb_segment()

    skb_segment() has many "goto err;" that assume the @err variable
    contains -ENOMEM.

    A successful call to __skb_linearize() should not clear @err,
    otherwise a subsequent memory allocation error could return NULL.

    While we are at it, we might use -EINVAL instead of -ENOMEM when
    MAX_SKB_FRAGS limit is reached.

    [1]
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    CPU: 0 PID: 13285 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #146
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:tcp_gso_segment+0x3dc/0x1780 net/ipv4/tcp_offload.c:106
    Code: f0 ff ff 0f 87 1c fd ff ff e8 00 88 0b fb 48 8b 75 d0 48 b9 00 00 00 00 00 fc ff df 48 8d be 90 00 00 00 48 89 f8 48 c1 e8 03 b6 14 08 48 8d 86 94 00 00 00 48 89 c6 83 e0 07 48 c1 ee 03 0f
    RSP: 0018:ffff88019b7fd060 EFLAGS: 00010206
    RAX: 0000000000000012 RBX: 0000000000000020 RCX: dffffc0000000000
    RDX: 0000000000040000 RSI: 0000000000000000 RDI: 0000000000000090
    RBP: ffff88019b7fd0f0 R08: ffff88019510e0c0 R09: ffffed003b5c46d6
    R10: ffffed003b5c46d6 R11: ffff8801dae236b3 R12: 0000000000000001
    R13: ffff8801d6c581f4 R14: 0000000000000000 R15: ffff8801d6c58128
    FS: 00007fcae64d6700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004e8664 CR3: 00000001b669b000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    tcp4_gso_segment+0x1c3/0x440 net/ipv4/tcp_offload.c:54
    inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
    inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
    skb_mac_gso_segment+0x3b5/0x740 net/core/dev.c:2792
    __skb_gso_segment+0x3c3/0x880 net/core/dev.c:2865
    skb_gso_segment include/linux/netdevice.h:4099 [inline]
    validate_xmit_skb+0x640/0xf30 net/core/dev.c:3104
    __dev_queue_xmit+0xc14/0x3910 net/core/dev.c:3561
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
    neigh_hh_output include/net/neighbour.h:473 [inline]
    neigh_output include/net/neighbour.h:481 [inline]
    ip_finish_output2+0x1063/0x1860 net/ipv4/ip_output.c:229
    ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
    NF_HOOK_COND include/linux/netfilter.h:276 [inline]
    ip_output+0x223/0x880 net/ipv4/ip_output.c:405
    dst_output include/net/dst.h:444 [inline]
    ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
    iptunnel_xmit+0x567/0x850 net/ipv4/ip_tunnel_core.c:91
    ip_tunnel_xmit+0x1598/0x3af1 net/ipv4/ip_tunnel.c:778
    ipip_tunnel_xmit+0x264/0x2c0 net/ipv4/ipip.c:308
    __netdev_start_xmit include/linux/netdevice.h:4148 [inline]
    netdev_start_xmit include/linux/netdevice.h:4157 [inline]
    xmit_one net/core/dev.c:3034 [inline]
    dev_hard_start_xmit+0x26c/0xc30 net/core/dev.c:3050
    __dev_queue_xmit+0x29ef/0x3910 net/core/dev.c:3569
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
    neigh_direct_output+0x15/0x20 net/core/neighbour.c:1403
    neigh_output include/net/neighbour.h:483 [inline]
    ip_finish_output2+0xa67/0x1860 net/ipv4/ip_output.c:229
    ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
    NF_HOOK_COND include/linux/netfilter.h:276 [inline]
    ip_output+0x223/0x880 net/ipv4/ip_output.c:405
    dst_output include/net/dst.h:444 [inline]
    ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
    ip_queue_xmit+0x9df/0x1f80 net/ipv4/ip_output.c:504
    tcp_transmit_skb+0x1bf9/0x3f10 net/ipv4/tcp_output.c:1168
    tcp_write_xmit+0x1641/0x5c20 net/ipv4/tcp_output.c:2363
    __tcp_push_pending_frames+0xb2/0x290 net/ipv4/tcp_output.c:2536
    tcp_push+0x638/0x8c0 net/ipv4/tcp.c:735
    tcp_sendmsg_locked+0x2ec5/0x3f00 net/ipv4/tcp.c:1410
    tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447
    inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
    sock_sendmsg_nosec net/socket.c:641 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:651
    __sys_sendto+0x3d7/0x670 net/socket.c:1797
    __do_sys_sendto net/socket.c:1809 [inline]
    __se_sys_sendto net/socket.c:1805 [inline]
    __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1805
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x455ab9
    Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fcae64d5c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 00007fcae64d66d4 RCX: 0000000000455ab9
    RDX: 0000000000000001 RSI: 0000000020000200 RDI: 0000000000000013
    RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
    R13: 00000000004c1145 R14: 00000000004d1818 R15: 0000000000000006
    Modules linked in:
    Dumping ftrace buffer:
    (ftrace buffer empty)

    Fixes: ddff00d42043 ("net: Move skb_has_shared_frag check out of GRE code and into segmentation")
    Signed-off-by: Eric Dumazet
    Cc: Alexander Duyck
    Reported-by: syzbot
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

25 Jul, 2018

3 commits

  • [ Upstream commit e78bfb0751d4e312699106ba7efbed2bab1a53ca ]

    Commit 8b7008620b84 ("net: Don't copy pfmemalloc flag in
    __copy_skb_header()") introduced a different handling for the
    pfmemalloc flag in copy and clone paths.

    In __skb_clone(), now, the flag is set only if it was set in the
    original skb, but not cleared if it wasn't. This is wrong and
    might lead to socket buffers being flagged with pfmemalloc even
    if the skb data wasn't allocated from pfmemalloc reserves. Copy
    the flag instead of ORing it.

    Reported-by: Sabrina Dubroca
    Fixes: 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()")
    Signed-off-by: Stefano Brivio
    Tested-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Stefano Brivio
     
  • [ Upstream commit 8b7008620b8452728cadead460a36f64ed78c460 ]

    The pfmemalloc flag indicates that the skb was allocated from
    the PFMEMALLOC reserves, and the flag is currently copied on skb
    copy and clone.

    However, an skb copied from an skb flagged with pfmemalloc
    wasn't necessarily allocated from PFMEMALLOC reserves, and on
    the other hand an skb allocated that way might be copied from an
    skb that wasn't.

    So we should not copy the flag on skb copy, and rather decide
    whether to allow an skb to be associated with sockets unrelated
    to page reclaim depending only on how it was allocated.

    Move the pfmemalloc flag before headers_start[0] using an
    existing 1-bit hole, so that __copy_skb_header() doesn't copy
    it.

    When cloning, we'll now take care of this flag explicitly,
    contravening to the warning comment of __skb_clone().

    While at it, restore the newline usage introduced by commit
    b19372273164 ("net: reorganize sk_buff for faster
    __copy_skb_header()") to visually separate bytes used in
    bitfields after headers_start[0], that was gone after commit
    a9e419dc7be6 ("netfilter: merge ctinfo into nfct pointer storage
    area"), and describe the pfmemalloc flag in the kernel-doc
    structure comment.

    This doesn't change the size of sk_buff or cacheline boundaries,
    but consolidates the 15 bits hole before tc_index into a 2 bytes
    hole before csum, that could now be filled more easily.

    Reported-by: Patrick Talbert
    Fixes: c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves")
    Signed-off-by: Stefano Brivio
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Stefano Brivio
     
  • [ Upstream commit d5a672ac9f48f81b20b1cad1d9ed7bbf4e418d4c ]

    The gen_stats facility will add a header for the toplevel nlattr of type
    TCA_STATS2 that contains all stats added by qdisc callbacks. A reference
    to this header is stored in the gnet_dump struct, and when all the
    per-qdisc callbacks have finished adding their stats, the length of the
    containing header will be adjusted to the right value.

    However, on architectures that need padding (i.e., that don't set
    CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS), the padding nlattr is added
    before the stats, which means that the stored pointer will point to the
    padding, and so when the header is fixed up, the result is just a very
    big padding nlattr. Because most qdiscs also supply the legacy TCA_STATS
    struct, this problem has been mostly invisible, but we exposed it with
    the netlink attribute-based statistics in CAKE.

    Fix the issue by fixing up the stored pointer if it points to a padding
    nlattr.

    Tested-by: Pete Heist
    Tested-by: Kevin Darbyshire-Bryant
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     

12 Jun, 2018

3 commits

  • [ Upstream commit 644c7eebbfd59e72982d11ec6cc7d39af12450ae ]

    It seems that rtnl_group_changelink() can call do_setlink
    while a prior call to validate_linkmsg(dev = NULL, ...) could
    not validate IFLA_ADDRESS / IFLA_BROADCAST

    Make sure do_setlink() calls validate_linkmsg() instead
    of letting its callers having this responsibility.

    With help from Dmitry Vyukov, thanks a lot !

    BUG: KMSAN: uninit-value in is_valid_ether_addr include/linux/etherdevice.h:199 [inline]
    BUG: KMSAN: uninit-value in eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline]
    BUG: KMSAN: uninit-value in eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308
    CPU: 1 PID: 8695 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #103
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084
    __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686
    is_valid_ether_addr include/linux/etherdevice.h:199 [inline]
    eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline]
    eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308
    dev_set_mac_address+0x261/0x530 net/core/dev.c:7157
    do_setlink+0xbc3/0x5fc0 net/core/rtnetlink.c:2317
    rtnl_group_changelink net/core/rtnetlink.c:2824 [inline]
    rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976
    rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
    netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
    rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg net/socket.c:639 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
    __sys_sendmsg net/socket.c:2155 [inline]
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
    do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x455a09
    RSP: 002b:00007fc07480ec68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007fc07480f6d4 RCX: 0000000000455a09
    RDX: 0000000000000000 RSI: 00000000200003c0 RDI: 0000000000000014
    RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000

    Uninit was stored to memory at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
    kmsan_save_stack mm/kmsan/kmsan.c:294 [inline]
    kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685
    kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527
    __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:478
    do_setlink+0xb84/0x5fc0 net/core/rtnetlink.c:2315
    rtnl_group_changelink net/core/rtnetlink.c:2824 [inline]
    rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976
    rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
    netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
    rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg net/socket.c:639 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
    __sys_sendmsg net/socket.c:2155 [inline]
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
    do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
    kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
    kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2753 [inline]
    __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:988 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
    netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg net/socket.c:639 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
    __sys_sendmsg net/socket.c:2155 [inline]
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
    do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: e7ed828f10bd ("netlink: support setting devgroup parameters")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Dmitry Vyukov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 664088f8d68178809b848ca450f2797efb34e8e7 ]

    This patch reorders the error cases in showing the XPS configuration so
    that we hold off on memory allocation until after we have verified that we
    can support XPS on a given ring.

    Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexander Duyck
     
  • [ Upstream commit fa1be7e01ea863e911349e30456706749518eeab ]

    Some of the code paths calculating flow hash for IPv6 use flowlabel member
    of struct flowi6 which, despite its name, encodes both flow label and
    traffic class. If traffic class changes within a TCP connection (as e.g.
    ssh does), ECMP route can switch between path. It's also inconsistent with
    other code paths where ip6_flowlabel() (returning only flow label) is used
    to feed the key.

    Use only flow label everywhere, including one place where hash key is set
    using ip6_flowinfo().

    Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
    Fixes: f70ea018da06 ("net: Add functions to get skb->hash based on flow structures")
    Signed-off-by: Michal Kubecek
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Michal Kubecek
     

30 May, 2018

3 commits

  • [ Upstream commit ae4745730cf8e693d354ccd4dbaf59ea440c09a9 ]

    In some situation vlan packets do not have ethernet headers. One example
    is packets from tun devices. Users can specify vlan protocol in tun_pi
    field instead of IP protocol, and skb_vlan_untag() attempts to untag such
    packets.

    skb_vlan_untag() (more precisely, skb_reorder_vlan_header() called by it)
    however did not expect packets without ethernet headers, so in such a case
    size argument for memmove() underflowed and triggered crash.

    ====
    BUG: unable to handle kernel paging request at ffff8801cccb8000
    IP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
    PGD 9cee067 P4D 9cee067 PUD 1d9401063 PMD 1cccb7063 PTE 2810100028101
    Oops: 000b [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 1 PID: 17663 Comm: syz-executor2 Not tainted 4.16.0-rc7+ #368
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
    RSP: 0018:ffff8801cc046e28 EFLAGS: 00010287
    RAX: ffff8801ccc244c4 RBX: fffffffffffffffe RCX: fffffffffff6c4c2
    RDX: fffffffffffffffe RSI: ffff8801cccb7ffc RDI: ffff8801cccb8000
    RBP: ffff8801cc046e48 R08: ffff8801ccc244be R09: ffffed0039984899
    R10: 0000000000000001 R11: ffffed0039984898 R12: ffff8801ccc244c4
    R13: ffff8801ccc244c0 R14: ffff8801d96b7c06 R15: ffff8801d96b7b40
    FS: 00007febd562d700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff8801cccb8000 CR3: 00000001ccb2f006 CR4: 00000000001606e0
    DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
    Call Trace:
    memmove include/linux/string.h:360 [inline]
    skb_reorder_vlan_header net/core/skbuff.c:5031 [inline]
    skb_vlan_untag+0x470/0xc40 net/core/skbuff.c:5061
    __netif_receive_skb_core+0x119c/0x3460 net/core/dev.c:4460
    __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4627
    netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4701
    netif_receive_skb+0xae/0x390 net/core/dev.c:4725
    tun_rx_batched.isra.50+0x5ee/0x870 drivers/net/tun.c:1555
    tun_get_user+0x299e/0x3c20 drivers/net/tun.c:1962
    tun_chr_write_iter+0xb9/0x160 drivers/net/tun.c:1990
    call_write_iter include/linux/fs.h:1782 [inline]
    new_sync_write fs/read_write.c:469 [inline]
    __vfs_write+0x684/0x970 fs/read_write.c:482
    vfs_write+0x189/0x510 fs/read_write.c:544
    SYSC_write fs/read_write.c:589 [inline]
    SyS_write+0xef/0x220 fs/read_write.c:581
    do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    RIP: 0033:0x454879
    RSP: 002b:00007febd562cc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 00007febd562d6d4 RCX: 0000000000454879
    RDX: 0000000000000157 RSI: 0000000020000180 RDI: 0000000000000014
    RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 00000000000006b0 R14: 00000000006fc120 R15: 0000000000000000
    Code: 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 0f 82 03 01 00 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 89 d1 a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20
    RIP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43 RSP: ffff8801cc046e28
    CR2: ffff8801cccb8000
    ====

    We don't need to copy headers for packets which do not have preceding
    headers of vlan headers, so skip memmove() in that case.

    Fixes: 4bbb3e0e8239 ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off")
    Reported-by: Eric Dumazet
    Signed-off-by: Toshiaki Makita
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Toshiaki Makita
     
  • [ Upstream commit 4bbb3e0e8239f9079bf1fe20b3c0cb598714ae61 ]

    When we have a bridge with vlan_filtering on and a vlan device on top of
    it, packets would be corrupted in skb_vlan_untag() called from
    br_dev_xmit().

    The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(),
    which makes use of skb->mac_len. In this function mac_len is meant for
    handling rx path with vlan devices with reorder_header disabled, but in
    tx path mac_len is typically 0 and cannot be used, which is the problem
    in this case.

    The current code even does not properly handle rx path (skb_vlan_untag()
    called from __netif_receive_skb_core()) with reorder_header off actually.

    In rx path single tag case, it works as follows:

    - Before skb_reorder_vlan_header()

    mac_header data
    v v
    +-------------------+-------------+------+----
    | ETH | VLAN | ETH |
    | ADDRS | TPID | TCI | TYPE |
    +-------------------+-------------+------+----


    to be removed

    - After skb_reorder_vlan_header()

    mac_header data
    v v
    +-------------------+------+----
    | ETH | ETH |
    | ADDRS | TYPE |
    +-------------------+------+----

    This is ok, but in rx double tag case, it corrupts packets:

    - Before skb_reorder_vlan_header()

    mac_header data
    v v
    +-------------------+-------------+-------------+------+----
    | ETH | VLAN | VLAN | ETH |
    | ADDRS | TPID | TCI | TPID | TCI | TYPE |
    +-------------------+-------------+-------------+------+----


    should be removed

    actually will be removed

    - After skb_reorder_vlan_header()

    mac_header data
    v v
    +-------------------+------+----
    | ETH | ETH |
    | ADDRS | TYPE |
    +-------------------+------+----

    So, two of vlan tags are both removed while only inner one should be
    removed and mac_header (and mac_len) is broken.

    skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2),
    so use skb->data and skb->mac_header to calculate the right offset.

    Reported-by: Brandon Carpenter
    Fixes: a6e18ff11170 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off")
    Signed-off-by: Toshiaki Makita
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Toshiaki Makita
     
  • [ Upstream commit a6d50512b4d86ecd9f5952525e454583be1c3b14 ]

    If ethtool_ops->get_fecparam returns an error, pass that error on to the
    user, rather than ignoring it.

    Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes")
    Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Edward Cree
     

25 May, 2018

2 commits

  • [ Upstream commit 9709020c86f6bf8439ca3effc58cfca49a5de192 ]

    We must not call sock_diag_has_destroy_listeners(sk) on a socket
    that has no reference on net structure.

    BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
    BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609
    Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0

    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:

    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
    __sk_free+0x329/0x340 net/core/sock.c:1609
    sk_free+0x42/0x50 net/core/sock.c:1623
    sock_put include/net/sock.h:1664 [inline]
    reqsk_free include/net/request_sock.h:116 [inline]
    reqsk_put include/net/request_sock.h:124 [inline]
    inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline]
    reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739
    call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
    expire_timers kernel/time/timer.c:1363 [inline]
    __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
    run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
    __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
    invoke_softirq kernel/softirq.c:365 [inline]
    irq_exit+0x1d1/0x200 kernel/softirq.c:405
    exiting_irq arch/x86/include/asm/apic.h:525 [inline]
    smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
    apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863

    RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
    RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
    RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000
    RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680
    RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
    R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000
    arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
    default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
    arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
    default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
    cpuidle_idle_call kernel/sched/idle.c:153 [inline]
    do_idle+0x395/0x560 kernel/sched/idle.c:262
    cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368
    start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269
    secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242

    Allocated by task 4557:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
    kmem_cache_zalloc include/linux/slab.h:691 [inline]
    net_alloc net/core/net_namespace.c:383 [inline]
    copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423
    create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107
    unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206
    ksys_unshare+0x708/0xf90 kernel/fork.c:2408
    __do_sys_unshare kernel/fork.c:2476 [inline]
    __se_sys_unshare kernel/fork.c:2474 [inline]
    __x64_sys_unshare+0x31/0x40 kernel/fork.c:2474
    do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 69:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
    net_free net/core/net_namespace.c:399 [inline]
    net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406
    net_drop_ns net/core/net_namespace.c:405 [inline]
    cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541
    process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
    worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
    kthread+0x345/0x410 kernel/kthread.c:240
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

    The buggy address belongs to the object at ffff88018a02c140
    which belongs to the cache net_namespace of size 8832
    The buggy address is located 8800 bytes inside of
    8832-byte region [ffff88018a02c140, ffff88018a02e3c0)
    The buggy address belongs to the page:
    page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0
    flags: 0x2fffc0000008100(slab|head)
    raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001
    raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000
    page dumped because: kasan: bad access detected

    Fixes: b922622ec6ef ("sock_diag: don't broadcast kernel sockets")
    Signed-off-by: Eric Dumazet
    Cc: Craig Gallek
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 6358d49ac23995fdfe157cc8747ab0f274d3954b ]

    While removing queues from the XPS map, the individual CPU ID
    alone was used to index the CPUs map, this should be changed to also
    factor in the traffic class mapping for the CPU-to-queue lookup.

    Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
    Signed-off-by: Amritha Nambiar
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Amritha Nambiar
     

16 May, 2018

2 commits

  • commit 77d36398d99f2565c0a8d43a86fd520a82e64bb8 upstream.

    syzbot complained :

    BUG: KMSAN: uninit-value in memcmp+0x119/0x180 lib/string.c:861
    CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.16.0+ #82
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: ipv6_addrconf addrconf_dad_work
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
    memcmp+0x119/0x180 lib/string.c:861
    __hw_addr_add_ex net/core/dev_addr_lists.c:60 [inline]
    __dev_mc_add+0x1c2/0x8e0 net/core/dev_addr_lists.c:670
    dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687
    igmp6_group_added+0x2db/0xa00 net/ipv6/mcast.c:662
    ipv6_dev_mc_inc+0xe9e/0x1130 net/ipv6/mcast.c:914
    addrconf_join_solict net/ipv6/addrconf.c:2078 [inline]
    addrconf_dad_begin net/ipv6/addrconf.c:3828 [inline]
    addrconf_dad_work+0x427/0x2150 net/ipv6/addrconf.c:3954
    process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2113
    worker_thread+0x113c/0x24f0 kernel/workqueue.c:2247
    kthread+0x539/0x720 kernel/kthread.c:239

    Fixes: f001fde5eadd ("net: introduce a list of device addresses dev_addr_list (v6)")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • commit b13dda9f9aa7caceeee61c080c2e544d5f5d85e5 upstream.

    syzbot reported __skb_try_recv_from_queue() was using skb->peeked
    while it was potentially unitialized.

    We need to clear it in __skb_clone()

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

29 Apr, 2018

3 commits

  • [ Upstream commit 7ce2367254e84753bceb07327aaf5c953cfce117 ]

    Syzkaller spotted an old bug which leads to reading skb beyond tail by 4
    bytes on vlan tagged packets.
    This is caused because skb_vlan_tagged_multi() did not check
    skb_headlen.

    BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283 [inline]
    BUG: KMSAN: uninit-value in skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline]
    BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672 [inline]
    BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline]
    BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009
    CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
    eth_type_vlan include/linux/if_vlan.h:283 [inline]
    skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline]
    vlan_features_check include/linux/if_vlan.h:672 [inline]
    dflt_features_check net/core/dev.c:2949 [inline]
    netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009
    validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084
    __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549
    dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
    packet_snd net/packet/af_packet.c:2944 [inline]
    packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    sock_write_iter+0x3b9/0x470 net/socket.c:909
    do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
    do_iter_write+0x30d/0xd40 fs/read_write.c:932
    vfs_writev fs/read_write.c:977 [inline]
    do_writev+0x3c9/0x830 fs/read_write.c:1012
    SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
    SyS_writev+0x56/0x80 fs/read_write.c:1082
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x43ffa9
    RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9
    RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003
    RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0
    R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
    kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
    kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
    slab_post_alloc_hook mm/slab.h:445 [inline]
    slab_alloc_node mm/slub.c:2737 [inline]
    __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:984 [inline]
    alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
    sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
    packet_alloc_skb net/packet/af_packet.c:2803 [inline]
    packet_snd net/packet/af_packet.c:2894 [inline]
    packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    sock_write_iter+0x3b9/0x470 net/socket.c:909
    do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
    do_iter_write+0x30d/0xd40 fs/read_write.c:932
    vfs_writev fs/read_write.c:977 [inline]
    do_writev+0x3c9/0x830 fs/read_write.c:1012
    SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
    SyS_writev+0x56/0x80 fs/read_write.c:1082
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    Fixes: 58e998c6d239 ("offloading: Force software GSO for multiple vlan tags.")
    Reported-and-tested-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com
    Signed-off-by: Toshiaki Makita
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toshiaki Makita
     
  • [ Upstream commit 53b76cdf7e8fecec1d09e38aad2f8579882591a8 ]

    When coming from ndisc_netdev_event() in net/ipv6/ndisc.c,
    neigh_ifdown() is called with &nd_tbl, locking this while
    clearing the proxy neighbor entries when eg. deleting an
    interface. Calling the table's pndisc_destructor() with the
    lock still held, however, can cause a deadlock: When a
    multicast listener is available an IGMP packet of type
    ICMPV6_MGM_REDUCTION may be sent out. When reaching
    ip6_finish_output2(), if no neighbor entry for the target
    address is found, __neigh_create() is called with &nd_tbl,
    which it'll want to lock.

    Move the elements into their own list, then unlock the table
    and perform the destruction.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199289
    Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
    Signed-off-by: Wolfgang Bumiller
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wolfgang Bumiller
     
  • [ Upstream commit 7dd07c143a4b54d050e748bee4b4b9e94a7b1744 ]

    Since neigh_dump_table() calls nlmsg_parse() without giving policy
    constraints, attributes can have arbirary size that we must validate

    Reported by syzbot/KMSAN :

    BUG: KMSAN: uninit-value in neigh_master_filtered net/core/neighbour.c:2292 [inline]
    BUG: KMSAN: uninit-value in neigh_dump_table net/core/neighbour.c:2348 [inline]
    BUG: KMSAN: uninit-value in neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438
    CPU: 1 PID: 3575 Comm: syzkaller268891 Not tainted 4.16.0+ #83
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
    neigh_master_filtered net/core/neighbour.c:2292 [inline]
    neigh_dump_table net/core/neighbour.c:2348 [inline]
    neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438
    netlink_dump+0x9ad/0x1540 net/netlink/af_netlink.c:2225
    __netlink_dump_start+0x1167/0x12a0 net/netlink/af_netlink.c:2322
    netlink_dump_start include/linux/netlink.h:214 [inline]
    rtnetlink_rcv_msg+0x1435/0x1560 net/core/rtnetlink.c:4598
    netlink_rcv_skb+0x355/0x5f0 net/netlink/af_netlink.c:2447
    rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4653
    netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline]
    netlink_unicast+0x1672/0x1750 net/netlink/af_netlink.c:1337
    netlink_sendmsg+0x1048/0x1310 net/netlink/af_netlink.c:1900
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
    __sys_sendmsg net/socket.c:2080 [inline]
    SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
    SyS_sendmsg+0x54/0x80 net/socket.c:2087
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x43fed9
    RSP: 002b:00007ffddbee2798 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fed9
    RDX: 0000000000000000 RSI: 0000000020005000 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
    R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401800
    R13: 0000000000401890 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
    kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
    kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
    slab_post_alloc_hook mm/slab.h:445 [inline]
    slab_alloc_node mm/slub.c:2737 [inline]
    __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:984 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1183 [inline]
    netlink_sendmsg+0x9a6/0x1310 net/netlink/af_netlink.c:1875
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
    __sys_sendmsg net/socket.c:2080 [inline]
    SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
    SyS_sendmsg+0x54/0x80 net/socket.c:2087
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    Fixes: 21fdd092acc7 ("net: Add support for filtering neigh dump by master device")
    Signed-off-by: Eric Dumazet
    Cc: David Ahern
    Reported-by: syzbot
    Acked-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

12 Apr, 2018

2 commits

  • [ Upstream commit a9d48205d0aedda021fc3728972a9e9934c2b9de ]

    We want to use dev_valid_name() to validate tunnel names,
    so better use strnlen(name, IFNAMSIZ) than strlen(name) to make
    sure to not upset KASAN.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 1dfe82ebd7d8fd43dba9948fdfb31f145014baa0 ]

    skb mac header is not necessarily set at the time skb_network_protocol()
    is called. Use skb->data instead.

    BUG: KASAN: slab-out-of-bounds in skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739
    Read of size 2 at addr ffff8801b3097a0b by task syz-executor5/14242

    CPU: 1 PID: 14242 Comm: syz-executor5 Not tainted 4.16.0-rc6+ #280
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x24d lib/dump_stack.c:53
    print_address_description+0x73/0x250 mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report+0x23c/0x360 mm/kasan/report.c:412
    __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:443
    skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739
    harmonize_features net/core/dev.c:2924 [inline]
    netif_skb_features+0x509/0x9b0 net/core/dev.c:3011
    validate_xmit_skb+0x81/0xb00 net/core/dev.c:3084
    validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3142
    packet_direct_xmit+0x117/0x790 net/packet/af_packet.c:256
    packet_snd net/packet/af_packet.c:2944 [inline]
    packet_sendmsg+0x3aed/0x60b0 net/packet/af_packet.c:2969
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:639
    ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
    __sys_sendmsg+0xe5/0x210 net/socket.c:2081

    Fixes: 19acc327258a ("gso: Handle Trans-Ether-Bridging protocol in skb_network_protocol()")
    Signed-off-by: Eric Dumazet
    Cc: Pravin B Shelar
    Reported-by: Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

01 Apr, 2018

3 commits

  • [ Upstream commit 6e5d58fdc9bedd0255a8781b258f10bbdc63e975 ]

    When errors are enqueued to the error queue via sock_queue_err_skb()
    function, it is possible that the waiting application is not notified.

    Calling 'sk->sk_data_ready()' would not notify applications that
    selected only POLLERR events in poll() (for example).

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: Randy E. Witt
    Reviewed-by: Eric Dumazet
    Signed-off-by: Vinicius Costa Gomes
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vinicius Costa Gomes
     
  • [ Upstream commit 4dcb31d4649df36297296b819437709f5407059c ]

    Andrei Vagin reported a KASAN: slab-out-of-bounds error in
    skb_update_prio()

    Since SYNACK might be attached to a request socket, we need to
    get back to the listener socket.
    Since this listener is manipulated without locks, add const
    qualifiers to sock_cgroup_prioidx() so that the const can also
    be used in skb_update_prio()

    Also add the const qualifier to sock_cgroup_classid() for consistency.

    Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrei Vagin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 7fe4d6dcbcb43fe0282d4213fc52be178bb30e91 ]

    The current code performs unneeded free. Remove the redundant skb freeing
    during the error path.

    Fixes: 1555d204e743 ("devlink: Support for pipeline debug (dpipe)")
    Signed-off-by: Arkadi Sharshevsky
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Arkadi Sharshevsky
     

09 Mar, 2018

2 commits

  • [ Upstream commit a5f7add332b4ea6d4b9480971b3b0f5e66466ae9 ]

    pfifo_fast got percpu stats lately, uncovering a bug I introduced last
    year in linux-4.10.

    I missed the fact that we have to clear our temporary storage
    before calling __gnet_stats_copy_basic() in the case of percpu stats.

    Without this fix, rate estimators (tc qd replace dev xxx root est 1sec
    4sec pfifo_fast) are utterly broken.

    Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit ac5b70198adc25c73fba28de4f78adcee8f6be0b ]

    netif_set_real_num_tx_queues() can be called when netdev is up.
    That usually happens when user requests change of number of
    channels/rings with ethtool -L. The procedure for changing
    the number of queues involves resetting the qdiscs and setting
    dev->num_tx_queues to the new value. When the new value is
    lower than the old one, extra care has to be taken to ensure
    ordering of accesses to the number of queues vs qdisc reset.

    Currently the queues are reset before new dev->num_tx_queues
    is assigned, leaving a window of time where packets can be
    enqueued onto the queues going down, leading to a likely
    crash in the drivers, since most drivers don't check if TX
    skbs are assigned to an active queue.

    Fixes: e6484930d7c7 ("net: allocate tx queues in register_netdevice")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     

25 Feb, 2018

2 commits

  • commit 40ca54e3a686f13117f3de0c443f8026dadf7c44 upstream.

    syzbot reported a lockdep splat in gen_new_estimator() /
    est_fetch_counters() when attempting to lock est->stats_lock.

    Since est_fetch_counters() is called from BH context from timer
    interrupt, we need to block BH as well when calling it from process
    context.

    Most qdiscs use per cpu counters and are immune to the problem,
    but net/sched/act_api.c and net/netfilter/xt_RATEEST.c are using
    a spinlock to protect their data. They both call gen_new_estimator()
    while object is created and not yet alive, so this bug could
    not trigger a deadlock, only a lockdep splat.

    Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • commit 8d74e9f88d65af8bb2e095aff506aa6eac755ada upstream.

    skb_warn_bad_offload warns when packets enter the GSO stack that
    require skb_checksum_help or vice versa. Do not warn on arbitrary
    bad packets. Packet sockets can craft many. Syzkaller was able to
    demonstrate another one with eth_type games.

    In particular, suppress the warning when segmentation returns an
    error, which is for reasons other than checksum offload.

    See also commit 36c92474498a ("net: WARN if skb_checksum_help() is
    called on skb requiring segmentation") for context on this warning.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

22 Feb, 2018

1 commit

  • commit 4950276672fce5c241857540f8561c440663673d upstream.

    Patch series "kmemcheck: kill kmemcheck", v2.

    As discussed at LSF/MM, kill kmemcheck.

    KASan is a replacement that is able to work without the limitation of
    kmemcheck (single CPU, slow). KASan is already upstream.

    We are also not aware of any users of kmemcheck (or users who don't
    consider KASan as a suitable replacement).

    The only objection was that since KASAN wasn't supported by all GCC
    versions provided by distros at that time we should hold off for 2
    years, and try again.

    Now that 2 years have passed, and all distros provide gcc that supports
    KASAN, kill kmemcheck again for the very same reasons.

    This patch (of 4):

    Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

    [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
    Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
    Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Levin, Alexander (Sasha Levin)
     

13 Feb, 2018

1 commit

  • [ Upstream commit 4db428a7c9ab07e08783e0fcdc4ca0f555da0567 ]

    reuseport_add_sock() needs to deal with attaching a socket having
    its own sk_reuseport_cb, after a prior
    setsockopt(SO_ATTACH_REUSEPORT_?BPF)

    Without this fix, not only a WARN_ONCE() was issued, but we were also
    leaking memory.

    Thanks to sysbot and Eric Biggers for providing us nice C repros.

    ------------[ cut here ]------------
    socket already in reuseport group
    WARNING: CPU: 0 PID: 3496 at net/core/sock_reuseport.c:119  
    reuseport_add_sock+0x742/0x9b0 net/core/sock_reuseport.c:117
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 0 PID: 3496 Comm: syzkaller869503 Not tainted 4.15.0-rc6+ #245
    Hardware name: Google Google Compute Engine/Google Compute Engine,
    BIOS  
    Google 01/01/2011
    Call Trace:
      __dump_stack lib/dump_stack.c:17 [inline]
      dump_stack+0x194/0x257 lib/dump_stack.c:53
      panic+0x1e4/0x41c kernel/panic.c:183
      __warn+0x1dc/0x200 kernel/panic.c:547
      report_bug+0x211/0x2d0 lib/bug.c:184
      fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
      fixup_bug arch/x86/kernel/traps.c:247 [inline]
      do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
      do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
      invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079

    Fixes: ef456144da8e ("soreuseport: define reuseport groups")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot+c0ea2226f77a42936bf7@syzkaller.appspotmail.com
    Acked-by: Craig Gallek

    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet