01 Jun, 2014

40 commits

  • Greg Kroah-Hartman
     
  • [ Upstream commit e33d0ba8047b049c9262fdb1fcafb93cb52ceceb ]

    Recycling skb always had been very tough...

    This time it appears GRO layer can accumulate skb->truesize
    adjustments made by drivers when they attach a fragment to skb.

    skb_gro_receive() can only subtract from skb->truesize the used part
    of a fragment.

    I spotted this problem seeing TcpExtPruneCalled and
    TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where
    TCP receive window should be sized properly to accept traffic coming
    from a driver not overshooting skb->truesize.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit fbdc0ad095c0a299e9abf5d8ac8f58374951149a ]

    the value of itag is a random value from stack, and may not be initiated by
    fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID
    is not set

    This will make the cached dst uncertainty

    Signed-off-by: Li RongQing
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Li RongQing
     
  • [ Upstream commit 4de462ab63e23953fd05da511aeb460ae10cc726 ]

    When GRE support was added in linux-3.14, CHECKSUM_COMPLETE handling
    broke on GRE+IPv6 because we did not update/use the appropriate csum :

    GRO layer is supposed to use/update NAPI_GRO_CB(skb)->csum instead of
    skb->csum

    Tested using a GRE tunnel and IPv6 traffic. GRO aggregation now happens
    at the first level (ethernet device) instead of being done in gre
    tunnel. Native IPv6+TCP is still properly aggregated.

    Fixes: bf5a755f5e918 ("net-gre-gro: Add GRE support to the GRO stack")
    Signed-off-by: Eric Dumazet
    Cc: Jerry Chu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit bf63ac73b3e132e6bf0c8798aba7b277c3316e19 ]

    Kelly reported the following crash:

    IP: [] tcf_action_exec+0x46/0x90
    PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000
    RIP: 0010:[] [] tcf_action_exec+0x46/0x90
    RSP: 0018:ffff8800d21b9b90 EFLAGS: 00010283
    RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8
    RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840
    RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001
    R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840
    R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8
    FS: 00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0
    Stack:
    ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000
    ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90
    ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18
    Call Trace:
    [] tcindex_classify+0x88/0x9b
    [] tc_classify_compat+0x3e/0x7b
    [] tc_classify+0x25/0x9f
    [] htb_enqueue+0x55/0x27a
    [] dsmark_enqueue+0x165/0x1a4
    [] __dev_queue_xmit+0x35e/0x536
    [] dev_queue_xmit+0x10/0x12
    [] packet_sendmsg+0xb26/0xb9a
    [] ? __lock_acquire+0x3ae/0xdf3
    [] __sock_sendmsg_nosec+0x25/0x27
    [] sock_aio_write+0xd0/0xe7
    [] do_sync_write+0x59/0x78
    [] vfs_write+0xb5/0x10a
    [] SyS_write+0x49/0x7f
    [] system_call_fastpath+0x16/0x1b

    This is because we memcpy struct tcindex_filter_result which contains
    struct tcf_exts, obviously struct list_head can not be simply copied.
    This is a regression introduced by commit 33be627159913b094bb578
    (net_sched: act: use standard struct list_head).

    It's not very easy to fix it as the code is a mess:

    if (old_r)
    memcpy(&cr, r, sizeof(cr));
    else {
    memset(&cr, 0, sizeof(cr));
    tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
    }
    ...
    tcf_exts_change(tp, &cr.exts, &e);
    ...
    memcpy(r, &cr, sizeof(cr));

    the above code should equal to:

    tcindex_filter_result_init(&cr);
    if (old_r)
    cr.res = r->res;
    ...
    if (old_r)
    tcf_exts_change(tp, &r->exts, &e);
    else
    tcf_exts_change(tp, &cr.exts, &e);
    ...
    r->res = cr.res;

    after this change, since there is no need to copy struct tcf_exts.

    And it also fixes other places zero'ing struct's contains struct tcf_exts.

    Fixes: commit 33be627159913b0 (net_sched: act: use standard struct list_head)
    Reported-by: Kelly Anderson
    Tested-by: Kelly Anderson
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 78ff4be45a4c51d8fb21ad92e4fabb467c6c3eeb ]

    We need to initialize the fallback device to have a correct mtu
    set on this device. Otherwise the mtu is set to null and the device
    is unusable.

    Fixes: fd58156e456d ("IPIP: Use ip-tunneling code.")
    Cc: Pravin B Shelar
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Steffen Klassert
     
  • [ Upstream commit cc2f33860cea0e48ebec096130bd0f7c4bf6e0bc ]

    Change introduced by 88e48d7b3340ef07b108eb8a8b3813dd093cc7f7
    ("batman-adv: make DAT drop ARP requests targeting local clients")
    implements a check that prevents DAT from using the caching
    mechanism when the client that is supposed to provide a reply
    to an arp request is local.

    However change brought by be1db4f6615b5e6156c807ea8985171c215c2d57
    ("batman-adv: make the Distributed ARP Table vlan aware")
    has not converted the above check into its vlan aware version
    thus making it useless when the local client is behind a vlan.

    Fix the behaviour by properly specifying the vlan when
    checking for a client being local or not.

    Reported-by: Simon Wunderlich
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner
    Signed-off-by: Greg Kroah-Hartman

    Antonio Quartulli
     
  • [ Upstream commit 377fe0f968b30a1a714fab53a908061914f30e26 ]

    A pointer to the orig_node representing a bat-gateway is
    stored in the gw_node->orig_node member, but the refcount
    for such orig_node is never increased.
    This leads to memory faults when gw_node->orig_node is accessed
    and the originator has already been freed.

    Fix this by increasing the refcount on gw_node creation
    and decreasing it on gw_node free.

    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner
    Signed-off-by: Greg Kroah-Hartman

    Antonio Quartulli
     
  • [ Upstream commit be181015a189cd141398b761ba4e79d33fe69949 ]

    In the new fragmentation code the batadv_frag_send_packet()
    function obtains a reference to the primary_if, but it does
    not release it upon return.

    This reference imbalance prevents the primary_if (and then
    the related netdevice) to be properly released on shut down.

    Fix this by releasing the primary_if in batadv_frag_send_packet().

    Introduced by ee75ed88879af88558818a5c6609d85f60ff0df4
    ("batman-adv: Fragment and send skbs larger than mtu")

    Cc: Martin Hundebøll
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner
    Acked-by: Martin Hundebøll
    Signed-off-by: Greg Kroah-Hartman

    Antonio Quartulli
     
  • [ Upstream commit 16a4142363b11952d3aa76ac78004502c0c2fe6e ]

    If hard_iface is NULL and goto out is made batadv_hardif_free_ref()
    doesn't check for NULL before dereferencing it to get to refcount.

    Introduced in cb1c92ec37fb70543d133a1fa7d9b54d6f8a1ecd
    ("batman-adv: add debugfs support to view multiif tables").

    Reported-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Acked-by: Antonio Quartulli
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Marek Lindner
     
  • [ Upstream commit 29e98242783ed3ba569797846a606ba66f781625 ]

    Starting from linux-3.13, GRO attempts to build full size skbs.

    Problem is the commit assumed one particular field in skb->cb[]
    was clean, but it is not the case on some stacked devices.

    Timo reported a crash in case traffic is decrypted before
    reaching a GRE device.

    Fix this by initializing NAPI_GRO_CB(skb)->last at the right place,
    this also removes one conditional.

    Thanks a lot to Timo for providing full reports and bisecting this.

    Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb")
    Bisected-by: Timo Teras
    Signed-off-by: Eric Dumazet
    Tested-by: Timo Teräs
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 81c708068dfedece038e07d818ba68333d8d885d ]

    I've missed to add a NULL entry to the bond_intmax_tbl when I introduced
    it with the conversion of arp_interval so add it now.

    CC: Jay Vosburgh
    CC: Veaceslav Falico
    CC: Andy Gospodarek

    Fixes: 7bdb04ed0dbf ("bonding: convert arp_interval to use the new option API")
    Signed-off-by: Nikolay Aleksandrov
    Acked-by: Veaceslav Falico
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     
  • [ Upstream commit b394745df2d9d4c30bf1bcc55773bec6f3bc7c67 ]

    After the call to phy_init_hw failed in phy_attach_direct, phy_detach is called
    to detach the phy device from its network device. If the attached driver is a
    generic phy driver, this also detaches the driver. Subsequently phy_resume
    is called, which assumes without checking that a driver is attached to the
    device. This will result in a crash such as

    Unable to handle kernel paging request for data at address 0xffffffffffffff90
    Faulting instruction address: 0xc0000000003a0e18
    Oops: Kernel access of bad area, sig: 11 [#1]
    ...
    NIP [c0000000003a0e18] .phy_attach_direct+0x68/0x17c
    LR [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c
    Call Trace:
    [c0000003fc0475d0] [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c (unreliable)
    [c0000003fc047670] [c0000000003a0ff8] .phy_connect_direct+0x28/0x98
    [c0000003fc047700] [c0000000003f0074] .of_phy_connect+0x4c/0xa4

    Only call phy_resume if phy_init_hw was successful.

    Signed-off-by: Guenter Roeck
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guenter Roeck
     
  • [ Upstream commit 200b916f3575bdf11609cb447661b8d5957b0bbf ]

    From: Cong Wang

    commit 50624c934db18ab90 (net: Delay default_device_exit_batch until no
    devices are unregistering) introduced rtnl_lock_unregistering() for
    default_device_exit_batch(). Same race could happen we when rmmod a driver
    which calls rtnl_link_unregister() as we call dev->destructor without rtnl
    lock.

    For long term, I think we should clean up the mess of netdev_run_todo()
    and net namespce exit code.

    Cc: Eric W. Biederman
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 3a1cebe7e05027a1c96f2fc1a8eddf5f19b78f42 ]

    tot_len does specify the size of struct ipv6_txoptions. We need opt_flen +
    opt_nflen to calculate the overall length of additional ipv6 extensions.

    I found this while auditing the ipv6 output path for a memory corruption
    reported by Alexey Preobrazhensky while he fuzzed an instrumented
    AddressSanitizer kernel with trinity. This may or may not be the cause
    of the original bug.

    Fixes: 4df98e76cde7c6 ("ipv6: pmtudisc setting not respected with UFO/CORK")
    Reported-by: Alexey Preobrazhensky
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hannes Frederic Sowa
     
  • [ Upstream commit 3d4405226d27b3a215e4d03cfa51f536244e5de7 ]

    net_get_random_once depends on the static keys infrastructure to patch up
    the branch to the slow path during boot. This was realized by abusing the
    static keys api and defining a new initializer to not enable the call
    site while still indicating that the branch point should get patched
    up. This was needed to have the fast path considered likely by gcc.

    The static key initialization during boot up normally walks through all
    the registered keys and either patches in ideal nops or enables the jump
    site but omitted that step on x86 if ideal nops where already placed at
    static_key branch points. Thus net_get_random_once branches not always
    became active.

    This patch switches net_get_random_once to the ordinary static_key
    api and thus places the kernel fast path in the - by gcc considered -
    unlikely path. Microbenchmarks on Intel and AMD x86-64 showed that
    the unlikely path actually beats the likely path in terms of cycle cost
    and that different nop patterns did not make much difference, thus this
    switch should not be noticeable.

    Fixes: a48e42920ff38b ("net: introduce new macro net_get_random_once")
    Reported-by: Tuomas Räsänen
    Cc: Linus Torvalds
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hannes Frederic Sowa
     
  • [ Upstream commit e84d2f8d2ae33c8215429824e1ecf24cbca9645e ]

    This is the s390 variant of Alexei's JIT bug fix.
    (patch description below stolen from Alexei's patch)

    bpf_alloc_binary() adds 128 bytes of room to JITed program image
    and rounds it up to the nearest page size. If image size is close
    to page size (like 4000), it is rounded to two pages:
    round_up(4000 + 4 + 128) == 8192
    then 'hole' is computed as 8192 - (4000 + 4) = 4188
    If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
    then kernel will crash during bpf_jit_free():

    kernel BUG at arch/x86/mm/pageattr.c:887!
    Call Trace:
    [] change_page_attr_set_clr+0x135/0x460
    [] ? _raw_spin_unlock_irq+0x30/0x50
    [] set_memory_rw+0x2f/0x40
    [] bpf_jit_free_deferred+0x2d/0x60
    [] process_one_work+0x1d8/0x6a0
    [] ? process_one_work+0x178/0x6a0
    [] worker_thread+0x11c/0x370

    since bpf_jit_free() does:
    unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
    struct bpf_binary_header *header = (void *)addr;
    to compute start address of 'bpf_binary_header'
    and header->pages will pass junk to:
    set_memory_rw(addr, header->pages);

    Fix it by making sure that &header->image[prandom_u32() % hole] and &header
    are in the same page.

    Fixes: aa2d2c73c21f2 ("s390/bpf,jit: address randomize and write protect jit code")

    Reported-by: Alexei Starovoitov
    Cc: # v3.11+
    Signed-off-by: Heiko Carstens
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Heiko Carstens
     
  • [ Upstream commit 773cd38f40b8834be991dbfed36683acc1dd41ee ]

    bpf_alloc_binary() adds 128 bytes of room to JITed program image
    and rounds it up to the nearest page size. If image size is close
    to page size (like 4000), it is rounded to two pages:
    round_up(4000 + 4 + 128) == 8192
    then 'hole' is computed as 8192 - (4000 + 4) = 4188
    If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
    then kernel will crash during bpf_jit_free():

    kernel BUG at arch/x86/mm/pageattr.c:887!
    Call Trace:
    [] change_page_attr_set_clr+0x135/0x460
    [] ? _raw_spin_unlock_irq+0x30/0x50
    [] set_memory_rw+0x2f/0x40
    [] bpf_jit_free_deferred+0x2d/0x60
    [] process_one_work+0x1d8/0x6a0
    [] ? process_one_work+0x178/0x6a0
    [] worker_thread+0x11c/0x370

    since bpf_jit_free() does:
    unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
    struct bpf_binary_header *header = (void *)addr;
    to compute start address of 'bpf_binary_header'
    and header->pages will pass junk to:
    set_memory_rw(addr, header->pages);

    Fix it by making sure that &header->image[prandom_u32() % hole] and &header
    are in the same page

    Fixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     
  • [ Upstream commit 709de13f0c532fe9c468c094aff069a725ed57fe ]

    When an interface is removed separately, all neighbors need to be
    checked if they have a neigh_ifinfo structure for that particular
    interface. If that is the case, remove that ifinfo so any references to
    a hard interface can be freed.

    This is a regression introduced by
    89652331c00f43574515059ecbf262d26d885717
    ("batman-adv: split tq information in neigh_node struct")

    Reported-by: Antonio Quartulli
    Signed-off-by: Simon Wunderlich
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Simon Wunderlich
     
  • [ Upstream commit 7b955a9fc164487d7c51acb9787f6d1b01b35ef6 ]

    The current code will not execute batadv_purge_orig_neighbors() when an
    orig_ifinfo has already been purged. However we need to run it in any
    case. Fix that.

    This is a regression introduced by
    7351a4822d42827ba0110677c0cbad88a3d52585
    ("batman-adv: split out router from orig_node")

    Signed-off-by: Simon Wunderlich
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Simon Wunderlich
     
  • [ Upstream commit 000c8dff97311357535d64539e58990526e4de70 ]

    When an interface is removed from batman-adv, the orig_ifinfo of a
    orig_node may be removed without releasing the router first.
    This will prevent the reference for the neighbor pointed at by the
    orig_ifinfo->router to be released, and this leak may result in
    reference leaks for the interface used by this neighbor. Fix that.

    This is a regression introduced by
    7351a4822d42827ba0110677c0cbad88a3d52585
    ("batman-adv: split out router from orig_node").

    Reported-by: Antonio Quartulli
    Signed-off-by: Simon Wunderlich
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Simon Wunderlich
     
  • [ Upstream commit c1e517fbbcdb13f50662af4edc11c3251fe44f86 ]

    The neigh_ifinfo object must be freed if it has been used in
    batadv_iv_ogm_process_per_outif().

    This is a regression introduced by
    89652331c00f43574515059ecbf262d26d885717
    ("batman-adv: split tq information in neigh_node struct")

    Reported-by: Antonio Quartulli
    Signed-off-by: Simon Wunderlich
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Simon Wunderlich
     
  • [ Upstream commit 2176d5d41891753774f648b67470398a5acab584 ]

    Since commit 7e98056964("ipv6: router reachability probing"), a router falls
    into NUD_FAILED will be probed.

    Now if function rt6_select() selects a router which neighbour state is NUD_FAILED,
    and at the same time function rt6_probe() changes the neighbour state to NUD_PROBE,
    then function dst_neigh_output() can directly send packets, but actually the
    neighbour still is unreachable. If we set nud_state to NUD_INCOMPLETE instead
    NUD_PROBE, packets will not be sent out until the neihbour is reachable.

    In addition, because the route should be probes with a single NS, so we must
    set neigh->probes to neigh_max_probes(), then the neigh timer timeout and function
    neigh_timer_handler() will not send other NS Messages.

    Signed-off-by: Duan Jiong
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Duan Jiong
     
  • [ Upstream commit c8965932a2e3b70197ec02c6741c29460279e2a8 ]

    The function ip6_tnl_validate assumes that the rtnl
    attribute IFLA_IPTUN_PROTO always be filled . If this
    attribute is not filled by the userspace application
    kernel get crashed with NULL pointer dereference. This
    patch fixes the potential kernel crash when
    IFLA_IPTUN_PROTO is missing .

    Signed-off-by: Susant Sahani
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Susant Sahani
     
  • [ Upstream commit 1c3639005f48492e5f2d965779efd814e80f8b15 ]

    If the sfc driver is in legacy interrupt mode (either explicitly by
    using interrupt_mode module param or by falling back to it) it will
    hit a warning at kernel/irq/manage.c because it will try to free an irq
    which wasn't allocated by it in the first place because the MSI(X) irqs are
    zero and it'll try to free them unconditionally. So fix it by checking if
    we're in legacy mode and freeing the appropriate irqs.

    CC: Zenghui Shi
    CC: Ben Hutchings
    CC:
    CC: Shradha Shah
    CC: David S. Miller

    Fixes: 1899c111a535 ("sfc: Fix IRQ cleanup in case of a probe failure")
    Reported-by: Zenghui Shi
    Signed-off-by: Nikolay Aleksandrov
    Acked-by: Shradha Shah
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     
  • [ Upstream commit bbeb0eadcf9fe74fb2b9b1a6fea82cd538b1e556 ]

    Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti
    overflow on the underlying interface.

    Attempting the set IFF_ALLMULTI on the underlying interface would cause an
    error and the log message:

    "allmulti touches root, set allmulti failed."

    Signed-off-by: Peter Christensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Peter Christensen
     
  • [ Upstream commit 6b5eeb7f874b689403e52a646e485d0191ab9507 ]

    This driver maps 802.1q VLANs to MBIM sessions. The mapping is based on
    a bogus assumption that all tagged frames will use the acceleration API
    because we enable NETIF_F_HW_VLAN_CTAG_TX. This fails for e.g. frames
    tagged in userspace using packet sockets. Such frames will erroneously
    be considered as untagged and silently dropped based on not being IP.

    Fix by falling back to looking into the ethernet header for a tag if no
    accelerated tag was found.

    Fixes: a82c7ce5bc5b ("net: cdc_ncm: map MBIM IPS SessionID to VLAN ID")
    Cc: Greg Suarez
    Signed-off-by: Bjørn Mork
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Bjørn Mork
     
  • [ Upstream commit aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd ]

    Increment fib_info_cnt in fib_create_info() right after successfuly
    alllocating fib_info structure, overwise fib_metrics allocation failure
    leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
    on error path from fib_create_info().

    Signed-off-by: Sergey Popovich
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sergey Popovich
     
  • [ Upstream commit 418a31561d594a2b636c1e2fa94ecd9e1245abb1 ]

    If conntrack defragments incoming ipv6 frags it stores largest original
    frag size in ip6cb and sets ->local_df.

    We must thus first test the largest original frag size vs. mtu, and not
    vice versa.

    Without this patch PKTTOOBIG is still generated in ip6_fragment() later
    in the stack, but

    1) IPSTATS_MIB_INTOOBIGERRORS won't increment
    2) packet did (needlessly) traverse netfilter postrouting hook.

    Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path")
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit ca6c5d4ad216d5942ae544bbf02503041bd802aa ]

    local_df means 'ignore DF bit if set', so if its set we're
    allowed to perform ip fragmentation.

    This wasn't noticed earlier because the output path also drops such skbs
    (and emits needed icmp error) and because netfilter ip defrag did not
    set local_df until couple of days ago.

    Only difference is that DF-packets-larger-than MTU now discarded
    earlier (f.e. we avoid pointless netfilter postrouting trip).

    While at it, drop the repeated test ip_exceeds_mtu, checking it once
    is enough...

    Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path")
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 4f4178f3bb1f470d7fb863ec531e08e20a0fd51c ]

    Fixes this warning introduced by commit 5b8f15f78e6f
    ("net: cdc_mbim: handle IPv6 Neigbor Solicitations"):

    ===============================
    [ INFO: suspicious RCU usage. ]
    3.15.0-rc3 #213 Tainted: G W O
    -------------------------------
    net/8021q/vlan_core.c:69 suspicious rcu_dereference_check() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 1
    no locks held by ksoftirqd/0/3.

    stack backtrace:
    CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G W O 3.15.0-rc3 #213
    Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
    0000000000000001 ffff880232533bf0 ffffffff813a5ee6 0000000000000006
    ffff880232530090 ffff880232533c20 ffffffff81076b94 0000000000000081
    0000000000000000 ffff8802085ac000 ffff88007fc8ea00 ffff880232533c50
    Call Trace:
    [] dump_stack+0x4e/0x68
    [] lockdep_rcu_suspicious+0xfa/0x103
    [] __vlan_find_dev_deep+0x54/0x94
    [] cdc_mbim_rx_fixup+0x379/0x66a [cdc_mbim]
    [] ? _raw_spin_unlock_irqrestore+0x3a/0x49
    [] ? trace_hardirqs_on_caller+0x192/0x1a1
    [] usbnet_bh+0x59/0x287 [usbnet]
    [] tasklet_action+0xbb/0xcd
    [] __do_softirq+0x14c/0x30d
    [] run_ksoftirqd+0x1f/0x50
    [] smpboot_thread_fn+0x172/0x18e
    [] ? SyS_setgroups+0xdf/0xdf
    [] kthread+0xb5/0xbd
    [] ? __wait_for_common+0x13b/0x170
    [] ? __kthread_parkme+0x5c/0x5c
    [] ret_from_fork+0x7c/0xb0
    [] ? __kthread_parkme+0x5c/0x5c

    Fixes: 5b8f15f78e6f ("net: cdc_mbim: handle IPv6 Neigbor Solicitations")
    Signed-off-by: Bjørn Mork
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Bjørn Mork
     
  • [ Upstream commit 22fb22eaebf4d16987f3fd9c3484c436ee0badf2 ]

    The connected check fails to check for ip_gre nbma mode tunnels
    properly. ip_gre creates temporary tnl_params with daddr specified
    to pass-in the actual target on per-packet basis from neighbor
    layer. Detect these tunnels by inspecting the actual tunnel
    configuration.

    Minimal test case:
    ip route add 192.168.1.1/32 via 10.0.0.1
    ip route add 192.168.1.2/32 via 10.0.0.2
    ip tunnel add nbma0 mode gre key 1 tos c0
    ip addr add 172.17.0.0/16 dev nbma0
    ip link set nbma0 up
    ip neigh add 172.17.0.1 lladdr 192.168.1.1 dev nbma0
    ip neigh add 172.17.0.2 lladdr 192.168.1.2 dev nbma0
    ping 172.17.0.1
    ping 172.17.0.2

    The second ping should be going to 192.168.1.2 and head 10.0.0.2;
    but cached gre tunnel level route is used and it's actually going
    to 192.168.1.1 via 10.0.0.1.

    The lladdr's need to go to separate dst for the bug to trigger.
    Test case uses separate route entries, but this can also happen
    when the route entry is same: if there is a nexthop exception or
    the GRE tunnel is IPsec'ed in which case the dst points to xfrm
    bundle unique to the gre lladdr.

    Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels")
    Signed-off-by: Timo Teräs
    Cc: Tom Herbert
    Cc: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Timo Teräs
     
  • [ Upstream commit e96f2e7c430014eff52c93cabef1ad4f42ed0db1 ]

    In ip_tunnel_rcv(), set skb->network_header to inner IP header
    before IP_ECN_decapsulate().

    Without the fix, IP_ECN_decapsulate() takes outer IP header as
    inner IP header, possibly causing error messages or packet drops.

    Note that this skb_reset_network_header() call was in this spot when
    the original feature for checking consistency of ECN bits through
    tunnels was added in eccc1bb8d4b4 ("tunnel: drop packet if ECN present
    with not-ECT"). It was only removed from this spot in 3d7b46cd20e3
    ("ip_tunnel: push generic protocol handling to ip_tunnel module.").

    Fixes: 3d7b46cd20e3 ("ip_tunnel: push generic protocol handling to ip_tunnel module.")
    Reported-by: Neal Cardwell
    Signed-off-by: Ying Cai
    Acked-by: Neal Cardwell
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ying Cai
     
  • [ Upstream commit 83d3459a5928f18c9344683e31bc2a7c3c25562a ]

    Carrying out PCI speed/width checks through pcie_get_minimum_link()
    on VFs yield wrong results, so remove them.

    Fixes: b912b2f ('net/mlx4_core: Warn if device doesn't have enough PCI bandwidth')
    Signed-off-by: Eyal Perry
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eyal Perry
     
  • [ Upstream commit 9becd707841207652449a8dfd90fe9c476d88546 ]

    Commit 4d619f625a60 ("net: cdc_ncm: no point in filling up the NTBs
    if we send ZLPs") changed the padding logic for devices with the ZLP
    flag set. This meant that frames of any size will be sent without
    additional padding, except for the single byte added if the size is
    a multiple of the USB packet size. But if the unpadded size is
    identical to the maximum frame size, and the maximum size is a
    multiplum of the USB packet size, then this one-byte padding will
    overflow the buffer.

    Prevent padding if already at maximum frame size, letting usbnet
    transmit a ZLP instead in this case.

    Fixes: 4d619f625a60 ("net: cdc_ncm: no point in filling up the NTBs if we send ZLPs")
    Reported by: Yu-an Shih
    Signed-off-by: Bjørn Mork
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Bjørn Mork
     
  • [ Upstream commit 2c4a336e0a3e203fab6aa8d8f7bb70a0ad968a6b ]

    Right now the core vsock module is the owner of the proto family. This
    means there's nothing preventing the transport module from unloading if
    there are open sockets, which results in a panic. Fix that by allowing
    the transport to be the owner, which will refcount it properly.

    Includes version bump to 1.0.1.0-k

    Passes checkpatch this time, I swear...

    Acked-by: Dmitry Torokhov
    Signed-off-by: Andy King
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Andy King
     
  • [ Upstream commit f6a082fed1e6407c2f4437d0d963b1bcbe5f9f58 ]

    hhf_change() takes the sch_tree_lock and releases it but misses the
    error cases. Fix the missed case here.

    To reproduce try a command like this,

    # tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000

    Signed-off-by: John Fastabend
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Fastabend
     
  • [ Upstream commit 0cda345d1b2201dd15591b163e3c92bad5191745 ]

    commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent
    divide error) try to prevent divide error, but there is still a little
    chance that delayed_ack can reach zero. In case the param cnt get
    negative value, then ratio+cnt would overflow and may happen to be zero.
    As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero.

    In some old kernels, such as 2.6.32, there is a bug that would
    pass negative param, which then ultimately leads to this divide error.

    commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count
    with skb MSS) fixed the negative param issue. However,
    it's safe that we fix the range of delayed_ack as well,
    to make sure we do not hit a divide by zero.

    CC: Stephen Hemminger
    Signed-off-by: Liu Yu
    Signed-off-by: Eric Dumazet
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Liu Yu
     
  • [ Upstream commit f114890cdf84d753f6b41cd0cc44ba51d16313da ]

    This reverts commit 12a2856b604476c27d85a5f9a57ae1661fc46019.
    The commit above doesn't appear to be necessary any more as the
    checksums appear to be correctly computed/validated.

    Additionally the above commit breaks kvm configurations where
    one VM is using a device that support checksum offload (virtio) and
    the other VM does not.
    In this case, packets leaving virtio device will have CHECKSUM_PARTIAL
    set. The packets is forwarded to a macvtap that has offload features
    turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not
    update the checksum and thus a bad checksum is passed up to
    the guest.

    CC: Daniel Lezcano
    CC: Patrick McHardy
    CC: Andrian Nord
    CC: Eric Dumazet
    CC: Michael S. Tsirkin
    CC: Jason Wang
    Signed-off-by: Vlad Yasevich
    Acked-by: Michael S. Tsirkin
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vlad Yasevich
     
  • [ Upstream commit cbdb04279ccaefcc702c8757757eea8ed76e50cf ]

    The following is a problematic configuration:

    VM1: virtio-net device connected to macvtap0@eth0
    VM2: e1000 device connect to macvtap1@eth0

    The problem is is that virtio-net supports checksum offloading
    and thus sends the packets to the host with CHECKSUM_PARTIAL set.
    On the other hand, e1000 does not support any acceleration.

    For small TCP packets (and this includes the 3-way handshake),
    e1000 ends up receiving packets that only have a partial checksum
    set. This causes TCP to fail checksum validation and to drop
    packets. As a result tcp connections can not be established.

    Commit 3e4f8b787370978733ca6cae452720a4f0c296b8
    macvtap: Perform GSO on forwarding path.
    fixes this issue for large packets wthat will end up undergoing GSO.
    This commit adds a check for the non-GSO case and attempts to
    compute the checksum for partially checksummed packets in the
    non-GSO case.

    CC: Daniel Lezcano
    CC: Patrick McHardy
    CC: Andrian Nord
    CC: Eric Dumazet
    CC: Michael S. Tsirkin
    CC: Jason Wang
    Signed-off-by: Vlad Yasevich
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vlad Yasevich