13 Aug, 2016

1 commit

  • While hashing out BPF's current_task_under_cgroup helper bits, it came
    to discussion that the skb_in_cgroup helper name was suboptimally chosen.

    Tejun says:

    So, I think in_cgroup should mean that the object is in that
    particular cgroup while under_cgroup in the subhierarchy of that
    cgroup. Let's rename the other subhierarchy test to under too. I
    think that'd be a lot less confusing going forward.

    [...]

    It's more intuitive and gives us the room to implement the real
    "in" test if ever necessary in the future.

    Since this touches uapi bits, we need to change this as long as v4.8
    is not yet officially released. Thus, change the helper enum and rename
    related bits.

    Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
    Reference: http://patchwork.ozlabs.org/patch/658500/
    Suggested-by: Sargun Dhillon
    Suggested-by: Tejun Heo
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov

    Daniel Borkmann
     

11 Aug, 2016

3 commits

  • The creation of a tunnel vport (geneve, gre, vxlan) brings up a
    corresponding netdev, a multi-step operation which can fail.

    For example, changing a vxlan vport's netdev state to 'up' binds the
    vport's socket to a UDP port - if the binding fails (e.g. due to the
    port being in use), the error is currently ignored giving the
    appearance that the tunnel vport creation completed successfully.

    Signed-off-by: Martynas Pumputis
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Martynas Pumputis
     
  • In commit cf6f7e1d5109 ("tipc: dump monitor attributes"),
    I dereferenced a pointer before checking if its valid.
    This is reported by static check Smatch as:
    net/tipc/monitor.c:733 tipc_nl_add_monitor_peer()
    warn: variable dereferenced before check 'mon' (see line 731)

    In this commit, we check for a valid monitor before proceeding
    with any other operation.

    Fixes: cf6f7e1d5109 ("tipc: dump monitor attributes")
    Reported-by: Dan Carpenter
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for your net tree,
    they are:

    1) Use mod_timer_pending() to avoid reactivating a dead expectation in
    the h323 conntrack helper, from Liping Zhang.

    2) Oneliner to fix a type in the register name defined in the nf_tables
    header.

    3) Don't try to look further when we find an inactive elements with no
    descendants in the rbtree set implementation, otherwise we crash.

    4) Handle valid zero CSeq in the SIP conntrack helper, from
    Christophe Leroy.

    5) Don't display a trailing slash in conntrack helper with no classes
    via /proc/net/nf_conntrack_expect, from Liping Zhang.

    6) Fix an expectation leak during creation from the nfqueue path, again
    from Liping Zhang.

    7) Validate netlink port ID in verdict message from nfqueue, otherwise
    an injection can be possible. Again from Zhang.

    8) Reject conntrack tuples with different transport protocol on
    original and reply tuples, also from Zhang.

    9) Validate offset and length in nft_exthdr, make sure they are under
    sizeof(u8), from Laura Garcia Liebana.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Aug, 2016

8 commits

  • Fix the direct assignment of offset and length attributes included in
    nft_exthdr structure from u32 data to u8.

    Signed-off-by: Laura Garcia Liebana
    Signed-off-by: Pablo Neira Ayuso

    Laura Garcia Liebana
     
  • Adding fdb entries pointing to the bridge device uses fdb_insert(),
    which lacks various checks and does not respect added_by_user flag.

    As a result, some inconsistent behavior can happen:
    * Adding temporary entries succeeds but results in permanent entries.
    * Same goes for "dynamic" and "use".
    * Changing mac address of the bridge device causes deletion of
    user-added entries.
    * Replacing existing entries looks successful from userspace but actually
    not, regardless of NLM_F_EXCL flag.

    Use the same logic as other entries and fix them.

    Fixes: 3741873b4f73 ("bridge: allow adding of fdb entries pointing to the bridge device")
    Signed-off-by: Toshiaki Makita
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Toshiaki Makita
     
  • When executing the script included below, the netns delete operation
    hangs with the following message (repeated at 10 second intervals):

    kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

    This occurs because a reference to the lo interface in the "secure" netns
    is still held by a dst entry in the xfrm bundle cache in the init netns.

    Address this problem by garbage collecting the tunnel netns flow cache
    when a cross-namespace vti interface receives a NETDEV_DOWN notification.

    A more detailed description of the problem scenario (referencing commands
    in the script below):

    (1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1

    The vti_test interface is created in the init namespace. vti_tunnel_init()
    attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
    setting the tunnel net to &init_net.

    (2) ip link set vti_test netns secure

    The vti_test interface is moved to the "secure" netns. Note that
    the associated struct ip_tunnel still has tunnel->net set to &init_net.

    (3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

    The first packet sent using the vti device causes xfrm_lookup() to be
    called as follows:

    dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);

    Note that tunnel->net is the init namespace, while skb_dst(skb) references
    the vti_test interface in the "secure" namespace. The returned dst
    references an interface in the init namespace.

    Also note that the first parameter to xfrm_lookup() determines which flow
    cache is used to store the computed xfrm bundle, so after xfrm_lookup()
    returns there will be a cached bundle in the init namespace flow cache
    with a dst referencing a device in the "secure" namespace.

    (4) ip netns del secure

    Kernel begins to delete the "secure" namespace. At some point the
    vti_test interface is deleted, at which point dst_ifdown() changes
    the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
    in the "secure" namespace however).
    Since nothing has happened to cause the init namespace's flow cache
    to be garbage collected, this dst remains attached to the flow cache,
    so the kernel loops waiting for the last reference to lo to go away.

    ip link add br1 type bridge
    ip link set dev br1 up
    ip addr add dev br1 1.1.1.1/8

    ip netns add secure
    ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
    ip link set vti_test netns secure
    ip netns exec secure ip link set vti_test up
    ip netns exec secure ip link s lo up
    ip netns exec secure ip addr add dev lo 192.168.100.1/24
    ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
    ip xfrm policy flush
    ip xfrm state flush
    ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
    proto esp mode tunnel mark 1
    ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
    proto esp mode tunnel mark 1
    ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
    mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
    ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
    mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788

    ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

    ip netns del secure

    Reported-by: Hangbin Liu
    Reported-by: Jan Tluka
    Signed-off-by: Lance Richardson
    Signed-off-by: David S. Miller

    Lance Richardson
     
  • Under certain conditions, the data_ready handler will discard a packet.
    These need to be freed.

    Signed-off-by: David Howells

    David Howells
     
  • Fix a use of a packet after it has been enqueued onto the packet processing
    queue in the data_ready handler. Once on a call's Rx queue, we mustn't
    touch it any more as it may be dequeued and freed by the call processor
    running on a work queue.

    Save the values we need before enqueuing.

    Without this, we can get an oops like the following:

    BUG: unable to handle kernel NULL pointer dereference at 000000000000009c
    IP: [] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
    PGD 0
    Oops: 0000 [#1] SMP
    Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
    CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 4.7.0-fsdevel+ #1336
    Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
    task: ffff88040d6863c0 task.stack: ffff88040d68c000
    RIP: 0010:[] [] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
    RSP: 0018:ffff88041fb03a78 EFLAGS: 00010246
    RAX: ffffffffffffffff RBX: ffff8803ff195b00 RCX: 0000000000000001
    RDX: ffffffffa01854d1 RSI: 0000000000000008 RDI: ffff8803ff195b00
    RBP: ffff88041fb03ab0 R08: 0000000000000000 R09: 0000000000000001
    R10: ffff88041fb038c8 R11: 0000000000000000 R12: ffff880406874800
    R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000000009c CR3: 0000000001c14000 CR4: 00000000001406e0
    Stack:
    ffff8803ff195ea0 ffff880408348800 ffff880406874800 ffff8803ff195b00
    ffff880408348800 ffff8803ff195ed8 0000000000000000 ffff88041fb03af0
    ffffffffa0186072 0000000000000000 ffff8804054da000 0000000000000000
    Call Trace:

    [] rxrpc_data_ready+0x89d/0xbae [af_rxrpc]
    [] __sock_queue_rcv_skb+0x24c/0x2b2
    [] __udp_queue_rcv_skb+0x4b/0x1bd
    [] udp_queue_rcv_skb+0x281/0x4db
    [] __udp4_lib_rcv+0x7ed/0x963
    [] udp_rcv+0x15/0x17
    [] ip_local_deliver_finish+0x1c3/0x318
    [] ip_local_deliver+0xbb/0xc4
    [] ? inet_del_offload+0x40/0x40
    [] ip_rcv_finish+0x3ce/0x42c
    [] ip_rcv+0x304/0x33d
    [] ? ip_local_deliver_finish+0x318/0x318
    [] __netif_receive_skb_core+0x601/0x6e8
    [] __netif_receive_skb+0x13/0x54
    [] netif_receive_skb_internal+0xbb/0x17c
    [] napi_gro_receive+0xf9/0x1bd
    [] rtl8169_poll+0x32b/0x4a8
    [] net_rx_action+0xe8/0x357
    [] __do_softirq+0x1aa/0x414
    [] irq_exit+0x3d/0xb0
    [] do_IRQ+0xe4/0xfc
    [] common_interrupt+0x93/0x93

    [] ? cpuidle_enter_state+0x1ad/0x2be
    [] ? cpuidle_enter_state+0x1a8/0x2be
    [] cpuidle_enter+0x12/0x14
    [] call_cpuidle+0x39/0x3b
    [] cpu_startup_entry+0x230/0x35d
    [] start_secondary+0xf4/0xf7

    Signed-off-by: David Howells

    David Howells
     
  • Once a packet has been posted to a connection in the data_ready handler, we
    mustn't try reposting if we then find that the connection is dying as the
    refcount has been given over to the dying connection and the packet might
    no longer exist.

    Losing the packet isn't a problem as the peer will retransmit.

    Signed-off-by: David Howells

    David Howells
     
  • The call state machine processor sets up the message parameters for a UDP
    message that it might need to transmit in advance on the basis that there's
    a very good chance it's going to have to transmit either an ACK or an
    ABORT. This requires it to look in the connection struct to retrieve some
    of the parameters.

    However, if the call is complete, the call connection pointer may be NULL
    to dissuade the processor from transmitting a message. However, there are
    some situations where the processor is still going to be called - and it's
    still going to set up message parameters whether it needs them or not.

    This results in a NULL pointer dereference at:

    net/rxrpc/call_event.c:837

    To fix this, skip the message pre-initialisation if there's no connection
    attached.

    Signed-off-by: David Howells

    David Howells
     
  • If rxrpc_new_client_call() fails to make a connection, the call record that
    it allocated needs to be marked as RXRPC_CALL_RELEASED before it is passed
    to rxrpc_put_call() to indicate that it no longer has any attachment to the
    AF_RXRPC socket.

    Without this, an assertion failure may occur at:

    net/rxrpc/call_object:635

    Signed-off-by: David Howells

    David Howells
     

09 Aug, 2016

11 commits

  • A newly added bugfix caused an uninitialized variable to be
    used for printing debug output. This is harmless as long
    as the debug setting is disabled, but otherwise leads to an
    immediate crash.

    gcc warns about this when -Wmaybe-uninitialized is enabled:

    net/rxrpc/call_object.c: In function 'rxrpc_release_call':
    net/rxrpc/call_object.c:496:163: error: 'sp' may be used uninitialized in this function [-Werror=maybe-uninitialized]

    The initialization was removed but one of the users remains.
    This adds back the initialization.

    Signed-off-by: Arnd Bergmann
    Fixes: 372ee16386bb ("rxrpc: Fix races between skb free, ACK generation and replying")
    Signed-off-by: David Howells

    Arnd Bergmann
     
  • Currently, user can add a conntrack with different l4proto via nfnetlink.
    For example, original tuple is TCP while reply tuple is SCTP. This is
    invalid combination, we should report EINVAL to userspace.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Like NFQNL_MSG_VERDICT_BATCH do, we should also reject the verdict
    request when the portid is not same with the initial portid(maybe
    from another process).

    Fixes: 97d32cf9440d ("netfilter: nfnetlink_queue: batch verdict support")
    Signed-off-by: Liping Zhang
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • User can use NFQA_EXP to attach expectations to conntracks, but we
    forget to put back nf_conntrack_expect when it is inserted successfully,
    i.e. in this normal case, expect's use refcnt will be 3. So even we
    unlink it and put it back later, the use refcnt is still 1, then the
    memory will be leaked forever.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • The 'name' filed in struct nf_conntrack_expect_policy{} is not a
    pointer, so check it is NULL or not will always return true. Even if the
    name is empty, slash will always be displayed like follows:
    # cat /proc/net/nf_conntrack_expect
    297 l3proto = 2 proto=6 src=1.1.1.1 dst=2.2.2.2 sport=1 dport=1025 ftp/
    ^

    Fixes: 3a8fc53a45c4 ("netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when
    it's available") used event->chunk->head_skb to get the head_skb in
    sctp_ulpevent_set_owner().

    But at that moment, the event->chunk was NULL, as it cloned the skb
    in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
    work.

    This patch is to move the event->chunk initialization before calling
    sctp_ulpevent_receive_data() so that it uses event->chunk when it's
    valid.

    Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't
    push rcsum of mac header back in and after BPF run back pull out again as
    opposed to some other subsystems (ovs, for example).

    For cases like q-in-q, meaning when a vlan tag for offloading is already
    present and we're about to push another one, then skb_vlan_push() pushes the
    inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing
    the 4 bytes vlan header diff. Likewise, for the reverse operation in
    skb_vlan_pop() for the case where vlan header needs to be pulled out of the
    skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes
    rcsum of the vlan header that was removed.

    However mangling the rcsum here will lead to hw csum failure for BPF case,
    since we're pulling or pushing data that was not part of the current rcsum.
    Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN()
    is also not really an option since current behaviour is ABI by now, but apart
    from that would also mean to do quite a bit of useless work in the sense that
    usually 12 bytes need to be rcsum pushed/pulled also when we don't need to
    touch this vlan related corner case. One way to fix it would be to push the
    necessary rcsum fixup down into vlan helpers that are (mostly) slow-path
    anyway.

    Fixes: 4e10df9a60d9 ("bpf: introduce bpf_skb_vlan_push/pop() helpers")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
    flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
    Where we ran into an issue with bpf_skb_store_bytes() is when we did a
    single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
    flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
    underlying issue has been tracked down to a buffer alignment issue.

    Meaning, that csum_partial() computations via skb_postpull_rcsum() and
    skb_postpush_rcsum() pair invoked had a wrong result since they operated on
    an odd address for the hoplimit, while other computations were done on an
    even address. This mix doesn't work as-is with skb_postpull_rcsum(),
    skb_postpush_rcsum() pair as it always expects at least half-word alignment
    of input buffers, which is normally the case. Thus, instead of these helpers
    using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
    csum_block_add(), respectively. For unaligned offsets, they rotate the sum
    to align it to a half-word boundary again, otherwise they work the same as
    csum_sub() and csum_add().

    Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
    offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
    csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
    use a 0 constant for offset so that the compiler optimizes the offset & 1
    test away and generates the same code as with csum_sub()/_add().

    Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Follow-up to commit f8ffad69c9f8 ("bpf: add skb_postpush_rcsum and fix
    dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect
    locations which need CHECKSUM_COMPLETE fixups on ingress.

    For the same reasons as described in f8ffad69c9f8 already, we of course
    also need this here, since dev_queue_xmit() on a veth device will let us
    end up in the dev_forward_skb() helper again to cross namespaces.

    Latter then calls into skb_postpull_rcsum() to pull out L2 header, so
    that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That
    is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers.

    Also here we have to address bpf_redirect() and bpf_clone_redirect().

    Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper")
    Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't
    rely upon TCPF_LISTEN flag solely being present when listening sockets
    are requested.

    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     
  • The asoc's timer value is not kept in asoc->timeouts array but in it's
    primary transport instead.

    Furthermore, we must export the timer only if it is pending, otherwise
    the value will underrun when stored in an unsigned variable and
    user space will only see a very large timeout value.

    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     

08 Aug, 2016

3 commits

  • Do not drop packet when CSeq is 0 as 0 is also a valid value for CSeq.

    simple_strtoul() will return 0 either when all digits are 0
    or if there are no digits at all. Therefore when simple_strtoul()
    returns 0 we check if first character is digit 0 or not.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Pablo Neira Ayuso

    Christophe Leroy
     
  • If we find a matching element that is inactive with no descendants, we
    jump to the found label, then crash because of nul-dereference on the
    left branch.

    Fix this by checking that the element is active and not an interval end
    and skipping the logic that only applies to the tree iteration.

    Signed-off-by: Pablo Neira Ayuso
    Tested-by: Anders K. Pedersen

    Pablo Neira Ayuso
     
  • Commit 96d1327ac2e3 ("netfilter: h323: Use mod_timer instead of
    set_expect_timeout") just simplify the source codes
    if (!del_timer(&exp->timeout))
    return 0;
    add_timer(&exp->timeout);
    to mod_timer(&exp->timeout, jiffies + info->timeout * HZ);

    This is not correct, and introduce a race codition:
    CPU0 CPU1
    - timer expire
    process_rcf expectation_timed_out
    lock(exp_lock) -
    find_exp waiting exp_lock...
    re-activate timer!! waiting exp_lock...
    unlock(exp_lock) lock(exp_lock)
    - unlink expect
    - free(expect)
    - unlock(exp_lock)
    So when the timer expires again, we will access the memory that
    was already freed.

    Replace mod_timer with mod_timer_pending here to fix this problem.

    Fixes: 96d1327ac2e3 ("netfilter: h323: Use mod_timer instead of set_expect_timeout")
    Cc: Gao Feng
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

07 Aug, 2016

1 commit

  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    First set of fixes for the current cycle:
    * fix 80+80 bandwidth warning
    * fix powersave with mac80211 TXQ implementation
    * use correct way to free SKBs from multicast buffering
    * mesh: fix operation ordering to work with all drivers
    * mesh: end service period even when peer goes away
    * mesh: correct HT opmode validity checks
    * pass hw pointer from mac80211 to driver in TPT method,
    fixing a bug (in a bit the wrong way, but that's what
    we have right now)
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

06 Aug, 2016

3 commits

  • Panic occurs when issuing "cat /proc/net/route" whilst
    populating FIB with > 1M routes.

    Use of cached node pointer in fib_route_get_idx is unsafe.

    BUG: unable to handle kernel paging request at ffffc90001630024
    IP: [] leaf_walk_rcu+0x10/0xe0
    PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0
    Oops: 0000 [#1] SMP
    Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac
    snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti
    acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd
    tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod
    CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000
    RIP: 0010:[] [] leaf_walk_rcu+0x10/0xe0
    RSP: 0018:ffff88011a05fda0 EFLAGS: 00010202
    RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20
    RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950
    RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000
    R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950
    R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480
    FS: 00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0
    Stack:
    ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00
    ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000
    ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80
    Call Trace:
    [] ? fib_route_seq_start+0x93/0xc0
    [] ? seq_read+0x149/0x380
    [] ? fsnotify+0x3b4/0x500
    [] ? process_echoes+0x70/0x70
    [] ? proc_reg_read+0x47/0x70
    [] ? __vfs_read+0x23/0xd0
    [] ? rw_verify_area+0x52/0xf0
    [] ? vfs_read+0x81/0x120
    [] ? SyS_read+0x42/0xa0
    [] ? entry_SYSCALL_64_fastpath+0x16/0x75
    Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
    a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89
    RIP [] leaf_walk_rcu+0x10/0xe0
    RSP
    CR2: ffffc90001630024

    Signed-off-by: Dave Forster
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    David Forster
     
  • Inside the kafs filesystem it is possible to occasionally have a call
    processed and terminated before we've had a chance to check whether we need
    to clean up the rx queue for that call because afs_send_simple_reply() ends
    the call when it is done, but this is done in a workqueue item that might
    happen to run to completion before afs_deliver_to_call() completes.

    Further, it is possible for rxrpc_kernel_send_data() to be called to send a
    reply before the last request-phase data skb is released. The rxrpc skb
    destructor is where the ACK processing is done and the call state is
    advanced upon release of the last skb. ACK generation is also deferred to
    a work item because it's possible that the skb destructor is not called in
    a context where kernel_sendmsg() can be invoked.

    To this end, the following changes are made:

    (1) kernel_rxrpc_data_consumed() is added. This should be called whenever
    an skb is emptied so as to crank the ACK and call states. This does
    not release the skb, however. kernel_rxrpc_free_skb() must now be
    called to achieve that. These together replace
    rxrpc_kernel_data_delivered().

    (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().

    This makes afs_deliver_to_call() easier to work as the skb can simply
    be discarded unconditionally here without trying to work out what the
    return value of the ->deliver() function means.

    The ->deliver() functions can, via afs_data_complete(),
    afs_transfer_reply() and afs_extract_data() mark that an skb has been
    consumed (thereby cranking the state) without the need to
    conditionally free the skb to make sure the state is correct on an
    incoming call for when the call processor tries to send the reply.

    (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
    has finished with a packet and MSG_PEEK isn't set.

    (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().

    Because of this, we no longer need to clear the destructor and put the
    call before we free the skb in cases where we don't want the ACK/call
    state to be cranked.

    (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
    than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
    the delivery function already), and the caller is now responsible for
    producing an abort if that was the last packet.

    (6) There are many bits of unmarshalling code where:

    ret = afs_extract_data(call, skb, last, ...);
    switch (ret) {
    case 0: break;
    case -EAGAIN: return 0;
    default: return ret;
    }

    is to be found. As -EAGAIN can now be passed back to the caller, we
    now just return if ret < 0:

    ret = afs_extract_data(call, skb, last, ...);
    if (ret < 0)
    return ret;

    (7) Checks for trailing data and empty final data packets has been
    consolidated as afs_data_complete(). So:

    if (skb->len > 0)
    return -EBADMSG;
    if (!last)
    return 0;

    becomes:

    ret = afs_data_complete(call, skb, last);
    if (ret < 0)
    return ret;

    (8) afs_transfer_reply() now checks the amount of data it has against the
    amount of data desired and the amount of data in the skb and returns
    an error to induce an abort if we don't get exactly what we want.

    Without these changes, the following oops can occasionally be observed,
    particularly if some printks are inserted into the delivery path:

    general protection fault: 0000 [#1] SMP
    Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
    CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303
    Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
    Workqueue: kafsd afs_async_workfn [kafs]
    task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
    RIP: 0010:[] [] __lock_acquire+0xcf/0x15a1
    RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002
    RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
    RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
    R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
    FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
    Stack:
    0000000000000006 000000000be04930 0000000000000000 ffff880400000000
    ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
    ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
    Call Trace:
    [] ? mark_held_locks+0x5e/0x74
    [] ? __local_bh_enable_ip+0x9b/0xa1
    [] ? trace_hardirqs_on_caller+0x16d/0x189
    [] lock_acquire+0x122/0x1b6
    [] ? lock_acquire+0x122/0x1b6
    [] ? skb_dequeue+0x18/0x61
    [] _raw_spin_lock_irqsave+0x35/0x49
    [] ? skb_dequeue+0x18/0x61
    [] skb_dequeue+0x18/0x61
    [] afs_deliver_to_call+0x344/0x39d [kafs]
    [] afs_process_async_call+0x4c/0xd5 [kafs]
    [] afs_async_workfn+0xe/0x10 [kafs]
    [] process_one_work+0x29d/0x57c
    [] worker_thread+0x24a/0x385
    [] ? rescuer_thread+0x2d0/0x2d0
    [] kthread+0xf3/0xfb
    [] ret_from_fork+0x1f/0x40
    [] ? kthread_create_on_node+0x1cf/0x1cf

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • net_device->ndo_set_rx_headroom (introduced in
    871b642adebe300be2e50aa5f65a418510f636ec) says

    "Setting a negtaive value reset the rx headroom
    to the default value".

    It seems that the OVS implementation in
    3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets
    dev->needed_headroom unconditionally.

    This doesn't have an immediate effect, but can mess up later
    LL_RESERVED_SPACE calculations, such as done in
    net/ipv6/mcast.c:mld_newpack. For reference, this issue was found
    from a skb_panic raised there after the length calculations had given
    the wrong result.

    Note the other current users of this interface
    (drivers/net/tun.c:tun_set_headroom and
    drivers/net/veth.c:veth_set_rx_headroom) are both checking this
    correctly thus need no modification.

    Thanks to Ben for some pointers from the crash dumps!

    Cc: Benjamin Poirier
    Cc: Paolo Abeni
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414
    Signed-off-by: Ian Wienand
    Signed-off-by: David S. Miller

    Ian Wienand
     

05 Aug, 2016

4 commits

  • The variable is added to allow the driver an easy access to
    it's own hw->priv when the op is invoked.

    This fixes a crash in wlcore because it was relying on a
    station pointer that wasn't initialized yet. It's the wrong
    way to fix the crash, but it solves the problem for now and
    it does make sense to have the hw pointer here.

    Signed-off-by: Maxim Altshul
    [rewrite commit message, fix indentation]
    Signed-off-by: Johannes Berg

    Maxim Altshul
     
  • Previously, NL80211_MESHCONF_HT_OPMODE validation rejected correct
    flag combinations, e.g. IEEE80211_HT_OP_MODE_PROTECTION_NONHT_MIXED |
    IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT.

    Doing just a range-check allows setting flags that don't exist (0x8)
    and invalid flag combinations.

    Implements some checks based on IEEE 802.11 2012 8.4.2.59 "HT
    Operation element".

    Signed-off-by: Masashi Honma
    [reword commit message, simplify a bit]
    Signed-off-by: Johannes Berg

    Masashi Honma
     
  • If QoS frame with EOSP (end of service period) subfield=1 sent by local
    peer was not acked by remote peer, local peer did not end the MPSP. This
    prevents local peer from going to DOZE state. And if the remote peer
    goes away without closing connection, local peer continues AWAKE state
    and wastes battery.

    Signed-off-by: Masashi Honma
    Acked-by: Bob Copeland
    Signed-off-by: Johannes Berg

    Masashi Honma
     
  • The code currently assumes that buffered multicast PS frames don't have
    a pending ACK frame for tx status reporting.
    However, hostapd sends a broadcast deauth frame on teardown for which tx
    status is requested. This can lead to the "Have pending ack frames"
    warning on module reload.
    Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.

    Cc: stable@vger.kernel.org
    Signed-off-by: Felix Fietkau
    Signed-off-by: Johannes Berg

    Felix Fietkau
     

04 Aug, 2016

1 commit

  • ovs_ct_find_existing() issues a warning if an existing conntrack entry
    classified as IP_CT_NEW is found, with the premise that this should
    not happen. However, a newly confirmed, non-expected conntrack entry
    remains IP_CT_NEW as long as no reply direction traffic is seen. This
    has resulted into somewhat confusing kernel log messages. This patch
    removes this check and warning.

    Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.")
    Suggested-by: Joe Stringer
    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     

03 Aug, 2016

3 commits

  • Pull networking fixes from David Miller:

    1) Fix several cases of missing of_node_put() calls in various
    networking drivers. From Peter Chen.

    2) Don't try to remove unconfigured VLANs in qed driver, from Yuval
    Mintz.

    3) Unbalanced locking in TIPC error handling, from Wei Yongjun.

    4) Fix lockups in CPDMA driver, from Grygorii Strashko.

    5) More MACSEC refcount et al fixes, from Sabrina Dubroca.

    6) Fix MAC address setting in r8169 during runtime suspend, from
    Chun-Hao Lin.

    7) Various printf format specifier fixes, from Heinrich Schuchardt.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
    qed: Fail driver load in 100g MSI mode.
    ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle
    ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle
    ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle
    ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle
    ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle
    ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle
    ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle
    ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle
    ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle
    ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle
    ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle
    ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle
    ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle
    ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle
    ethernet: altera: add missing of_node_put
    8139too: fix system hang when there is a tx timeout event.
    qed: Fix error return code in qed_resc_alloc()
    net: qlcnic: avoid superfluous assignement
    dsa: b53: remove redundant if
    ...

    Linus Torvalds
     
  • Some drivers (e.g. wl18xx) expect that the last stage in the
    de-initialization process will be stopping the beacons, similar to AP flow.
    Update ieee80211_stop_mesh() flow accordingly.
    As peers can be removed dynamically, this would not impact other drivers.

    Tested also on Ralink RT3572 chipset.

    Signed-off-by: Maital Hahn
    Signed-off-by: Yaniv Machani
    Signed-off-by: Johannes Berg

    Maital Hahn
     
  • Pull Ceph updates from Ilya Dryomov:
    "The highlights are:

    - RADOS namespace support in libceph and CephFS (Zheng Yan and
    myself). The stopgaps added in 4.5 to deny access to inodes in
    namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
    bit is now fully supported

    - A large rework of the MDS cap flushing code (Zheng Yan)

    - Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
    overly pessimistic before, bailing at the first sight of LOOKUP_RCU

    On top of that we've got a few CephFS bug fixes, a couple of cleanups
    and Arnd's workaround for a weird genksyms issue"

    * tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
    ceph: fix symbol versioning for ceph_monc_do_statfs
    ceph: Correctly return NXIO errors from ceph_llseek
    ceph: Mark the file cache as unreclaimable
    ceph: optimize cap flush waiting
    ceph: cleanup ceph_flush_snaps()
    ceph: kick cap flushes before sending other cap message
    ceph: introduce an inode flag to indicates if snapflush is needed
    ceph: avoid sending duplicated cap flush message
    ceph: unify cap flush and snapcap flush
    ceph: use list instead of rbtree to track cap flushes
    ceph: update types of some local varibles
    ceph: include 'follows' of pending snapflush in cap reconnect message
    ceph: update cap reconnect message to version 3
    ceph: mount non-default filesystem by name
    libceph: fsmap.user subscription support
    ceph: handle LOOKUP_RCU in ceph_d_revalidate
    ceph: allow dentry_lease_is_valid to work under RCU walk
    ceph: clear d_fsinfo pointer under d_lock
    ceph: remove ceph_mdsc_lease_release
    ceph: don't use ->d_time
    ...

    Linus Torvalds
     

02 Aug, 2016

2 commits