28 Sep, 2016

1 commit

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     

31 Aug, 2016

1 commit

  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Stable patches:
    - Fix a refcount leak in nfs_callback_up_net
    - Fix an Oopsable condition when the flexfile pNFS driver connection
    to the DS fails
    - Fix an Oopsable condition in NFSv4.1 server callback races
    - Ensure pNFS clients stop doing I/O to the DS if their lease has
    expired, as required by the NFSv4.1 protocol

    Bugfixes:
    - Fix potential looping in the NFSv4.x migration code
    - Patch series to close callback races for OPEN, LAYOUTGET and
    LAYOUTRETURN
    - Silence WARN_ON when NFSv4.1 over RDMA is in use
    - Fix a LAYOUTCOMMIT race in the pNFS/blocks client
    - Fix pNFS timeout issues when the DS fails"

    * tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4.x: Fix a refcount leak in nfs_callback_up_net
    NFS4: Avoid migration loops
    pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails
    NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence
    NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN
    NFSv4.1: Defer bumping the slot sequence number until we free the slot
    NFSv4.1: Delay callback processing when there are referring triples
    NFSv4.1: Fix Oopsable condition in server callback races
    SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use
    pnfs/blocklayout: update last_write_offset atomically with extents
    pNFS: The client must not do I/O to the DS if it's lease has expired
    pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls
    pNFS/flexfiles: Set reasonable default retrans values for the data channel
    NFS: Allow the mount option retrans=0
    pNFS/flexfiles: Fix layoutstat periodic reporting

    Linus Torvalds
     

27 Aug, 2016

1 commit


26 Aug, 2016

4 commits


25 Aug, 2016

1 commit

  • Using NFSv4.1 on RDMA should be safe, so broaden the new checks in
    rpc_create().

    WARN_ON_ONCE is used, matching most other WARN call sites in clnt.c.

    Fixes: 39a9beab5acb ("rpc: share one xps between all backchannels")
    Fixes: d50039ea5ee6 ("nfsd4/rpc: move backchannel create logic...")
    Signed-off-by: Chuck Lever
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

24 Aug, 2016

7 commits

  • During an audit for sk_filter(), we found that rx_busy_skb handling
    in l2cap_sock_recv_cb() and l2cap_sock_recvmsg() looks not quite as
    intended.

    The assumption from commit e328140fdacb ("Bluetooth: Use event-driven
    approach for handling ERTM receive buffer") is that errors returned
    from sock_queue_rcv_skb() are due to receive buffer shortage. However,
    nothing should prevent doing a setsockopt() with SO_ATTACH_FILTER on
    the socket, that could drop some of the incoming skbs when handled in
    sock_queue_rcv_skb().

    In that case sock_queue_rcv_skb() will return with -EPERM, propagated
    from sk_filter() and if in L2CAP_MODE_ERTM mode, wrong assumption was
    that we failed due to receive buffer being full. From that point onwards,
    due to the to-be-dropped skb being held in rx_busy_skb, we cannot make
    any forward progress as rx_busy_skb is never cleared from l2cap_sock_recvmsg(),
    due to the filter drop verdict over and over coming from sk_filter().
    Meanwhile, in l2cap_sock_recv_cb() all new incoming skbs are being
    dropped due to rx_busy_skb being occupied.

    Instead, just use __sock_queue_rcv_skb() where an error really tells that
    there's a receive buffer issue. Split the sk_filter() and enable it for
    non-segmented modes at queuing time since at this point in time the skb has
    already been through the ERTM state machine and it has been acked, so dropping
    is not allowed. Instead, for ERTM and streaming mode, call sk_filter() in
    l2cap_data_rcv() so the packet can be dropped before the state machine sees it.

    Fixes: e328140fdacb ("Bluetooth: Use event-driven approach for handling ERTM receive buffer")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Mat Martineau
    Acked-by: Willem de Bruijn
    Signed-off-by: Marcel Holtmann

    Daniel Borkmann
     
  • In hci_req_sync_complete the event skb is referenced in hdev->req_skb.
    It is used (via hci_req_run_skb) from either __hci_cmd_sync_ev which will
    pass the skb to the caller, or __hci_req_sync which leaks.

    unreferenced object 0xffff880005339a00 (size 256):
    comm "kworker/u3:1", pid 1011, jiffies 4294671976 (age 107.389s)
    backtrace:
    [] kmemleak_alloc+0x49/0xa0
    [] kmem_cache_alloc+0x128/0x180
    [] skb_clone+0x4f/0xa0
    [] hci_event_packet+0xc1/0x3290
    [] hci_rx_work+0x18b/0x360
    [] process_one_work+0x14a/0x440
    [] worker_thread+0x43/0x4d0
    [] kthread+0xc4/0xe0
    [] ret_from_fork+0x1f/0x40
    [] 0xffffffffffffffff

    Signed-off-by: Frédéric Dalleau
    Signed-off-by: Marcel Holtmann

    Frederic Dalleau
     
  • inet_diag_find_one_icsk takes a reference to a socket that is not
    released if sock_diag_destroy returns an error. Fix by changing
    tcp_diag_destroy to manage the refcnt for all cases and remove
    the sock_put calls from tcp_abort.

    Fixes: c1e64e298b8ca ("net: diag: Support destroying TCP sockets")
    Reported-by: Lorenzo Colitti
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • After commit ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
    we do not need this special allocation mode anymore, even if it is
    harmless.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The function sctp_diag_dump_one() currently performs a memcpy()
    of 64 bytes from a 16 byte field into another 16 byte field. Fix
    by using correct size, use sizeof to obtain correct size instead
    of using a hard-coded constant.

    Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
    Signed-off-by: Lance Richardson
    Reviewed-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Lance Richardson
     
  • When sending an ack in SYN_RECV state, we must scale the offered
    window if wscale option was negotiated and accepted.

    Tested:
    Following packetdrill test demonstrates the issue :

    0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

    +0 bind(3, ..., ...) = 0
    +0 listen(3, 1) = 0

    // Establish a connection.
    +0 < S 0:0(0) win 20000
    +0 > S. 0:0(0) ack 1 win 28960

    +0 < . 1:11(10) ack 1 win 156
    // check that window is properly scaled !
    +0 > . 1:1(0) ack 1 win 226

    Signed-off-by: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Acked-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Laura tracked poll() [and friends] regression caused by commit
    e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")

    udp_poll() needs to know if there is a valid packet in receive queue,
    even if its payload length is 0.

    Change first_packet_length() to return an signed int, and use -1
    as the indication of an empty queue.

    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Reported-by: Laura Abbott
    Signed-off-by: Eric Dumazet
    Tested-by: Laura Abbott
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Aug, 2016

3 commits

  • Encoding of the metadata was using the padded length as opposed to
    the real length of the data which is a bug per specification.
    This has not been an issue todate because all metadatum specified
    so far has been 32 bit where aligned and data length are the same width.
    This also includes a bug fix for validating the length of a u16 field.
    But since there is no metadata of size u16 yes we are fine to include it
    here.

    While at it get rid of magic numbers.

    Fixes: ef6980b6becb ("net sched: introduce IFE action")
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • In b8247f095e,

    "net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"

    gso skbs arriving from an ingress interface that go through UDP
    tunneling, are allowed to be fragmented if the resulting encapulated
    segments exceed the dst mtu of the egress interface.

    This aligned the behavior of gso skbs to non-gso skbs going through udp
    encapsulation path.

    However the non-gso vs gso anomaly is present also in the following
    cases of a GRE tunnel:
    - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
    (e.g. OvS vport-gre with df_default=false)
    - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set

    In both of the above cases, the non-gso skbs get fragmented, whereas the
    gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
    as they don't go through the segment+fragment code path.

    Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.

    Tunnels that do set IP_DF, will not go to fragmentation of segments.
    This preserves behavior of ip_gre in (the default) pmtudisc mode.

    Fixes: b8247f095e ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs")
    Reported-by: wenxu
    Cc: Hannes Frederic Sowa
    Signed-off-by: Shmulik Ladkani
    Tested-by: wenxu
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Shmulik Ladkani
     
  • If DAD fails with accept_dad set to 2, global addresses and host routes
    are incorrectly left in place. Even though disable_ipv6 is set,
    contrary to documentation, the addresses are not dynamically deleted
    from the interface. It is only on a subsequent link down/up that these
    are removed. The fix is not only to set the disable_ipv6 flag, but
    also to call addrconf_ifdown(), which is the action to carry out when
    disabling IPv6. This results in the addresses and routes being deleted
    immediately. The DAD failure for the LL addr is determined as before
    via netlink, or by the absence of the LL addr (which also previously
    would have had to be checked for in case of an intervening link down
    and up). As the call to addrconf_ifdown() requires an rtnl lock, the
    logic to disable IPv6 when DAD fails is moved to addrconf_dad_work().

    Previous behavior:

    root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
    net.ipv6.conf.eth3.accept_dad = 2
    root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
    root@vm1:/# ip link set up eth3
    root@vm1:/# ip -6 addr show dev eth3
    5: eth3: mtu 1500 qlen 1000
    inet6 2000::10/64 scope global
    valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe43:dd5a/64 scope link tentative dadfailed
    valid_lft forever preferred_lft forever
    root@vm1:/# ip -6 route show dev eth3
    2000::/64 proto kernel metric 256
    fe80::/64 proto kernel metric 256
    root@vm1:/# ip link set down eth3
    root@vm1:/# ip link set up eth3
    root@vm1:/# ip -6 addr show dev eth3
    root@vm1:/# ip -6 route show dev eth3
    root@vm1:/#

    New behavior:

    root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
    net.ipv6.conf.eth3.accept_dad = 2
    root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
    root@vm1:/# ip link set up eth3
    root@vm1:/# ip -6 addr show dev eth3
    root@vm1:/# ip -6 route show dev eth3
    root@vm1:/#

    Signed-off-by: Mike Manning
    Signed-off-by: David S. Miller

    Mike Manning
     

20 Aug, 2016

2 commits

  • The sk->sk_state is bits flag, so need use bit operation check
    instead of value check.

    Signed-off-by: Gao Feng
    Tested-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Gao Feng
     
  • Because otherwise when crc computation is still needed it's way more
    expensive than on a linear buffer to the point that it affects
    performance.

    It's so expensive that netperf test gives a perf output as below:

    Overhead Command Shared Object Symbol
    18,62% netserver [kernel.vmlinux] [k] crc32_generic_shift
    2,57% netserver [kernel.vmlinux] [k] __pskb_pull_tail
    1,94% netserver [kernel.vmlinux] [k] fib_table_lookup
    1,90% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string
    1,66% swapper [kernel.vmlinux] [k] intel_idle
    1,63% netserver [kernel.vmlinux] [k] _raw_spin_lock
    1,59% netserver [sctp] [k] sctp_packet_transmit
    1,55% netserver [kernel.vmlinux] [k] memcpy_erms
    1,42% netserver [sctp] [k] sctp_rcv

    # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
    SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    212992 212992 12000 10.00 3016.42 2.88 3.78 1.874 2.462

    After patch:
    Overhead Command Shared Object Symbol
    2,75% netserver [kernel.vmlinux] [k] memcpy_erms
    2,63% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string
    2,39% netserver [kernel.vmlinux] [k] fib_table_lookup
    2,04% netserver [kernel.vmlinux] [k] __pskb_pull_tail
    1,91% netserver [kernel.vmlinux] [k] _raw_spin_lock
    1,91% netserver [sctp] [k] sctp_packet_transmit
    1,72% netserver [mlx4_en] [k] mlx4_en_process_rx_cq
    1,68% netserver [sctp] [k] sctp_rcv

    # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
    SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    212992 212992 12000 10.00 3681.77 3.83 3.46 2.045 1.849

    Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize")
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

19 Aug, 2016

2 commits

  • 1) Fix one typo: s/tn/tp/
    2) Fix the description about the "u" bits.

    Signed-off-by: Xunlei Pang
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Xunlei Pang
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter updates for your net tree,
    they are:

    1) Dump only conntrack that belong to this namespace via /proc file.
    This is some fallout from the conversion to single conntrack table
    for all netns, patch from Liping Zhang.

    2) Missing MODULE_ALIAS_NF_LOGGER() for the ARP family that prevents
    module autoloading, also from Liping Zhang.

    3) Report overquota event to the right netnamespace, again from Liping.

    4) Fix tproxy listener sk refcount that leads to crash, from
    Eric Dumazet.

    5) Fix racy refcounting on object deletion from nfnetlink and rule
    removal both for nfacct and cttimeout, from Liping Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Aug, 2016

10 commits

  • In general, when we want to delete a netns, cttimeout_net_exit will
    be called before ipt_unregister_table, i.e. before ctnl_timeout_put.

    But after call kfree_rcu in cttimeout_net_exit, we will still decrease
    the timeout object's refcnt in ctnl_timeout_put, this is incorrect,
    and will cause a use after free error.

    It is easy to reproduce this problem:
    # while : ; do
    ip netns add xxx
    ip netns exec xxx nfct add timeout testx inet icmp timeout 200
    ip netns exec xxx iptables -t raw -p icmp -I OUTPUT -j CT --timeout testx
    ip netns del xxx
    done

    =======================================================================
    BUG kmalloc-96 (Tainted: G B E ): Poison overwritten
    -----------------------------------------------------------------------
    INFO: 0xffff88002b5161e8-0xffff88002b5161e8. First byte 0x6a instead of
    0x6b
    INFO: Allocated in cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
    age=104 cpu=0 pid=3330
    ___slab_alloc+0x4da/0x540
    __slab_alloc+0x20/0x40
    __kmalloc+0x1c8/0x240
    cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
    nfnetlink_rcv_msg+0x21a/0x230 [nfnetlink]
    [ ... ]

    So only when the refcnt decreased to 0, we call kfree_rcu to free the
    timeout object. And like nfnetlink_acct do, use atomic_cmpxchg to
    avoid race between ctnl_timeout_try_del and ctnl_timeout_put.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Suppose that we input the following commands at first:
    # nfacct add test
    # iptables -A INPUT -m nfacct --nfacct-name test

    And now "test" acct's refcnt is 2, but later when we try to delete the
    "test" nfacct and the related iptables rule at the same time, race maybe
    happen:
    CPU0 CPU1
    nfnl_acct_try_del nfnl_acct_put
    atomic_dec_and_test //ref=1,testfail -
    - atomic_dec_and_test //ref=0,testok
    - kfree_rcu
    atomic_inc //ref=1 -

    So after the rcu grace period, nf_acct will be freed but it is still linked
    in the nfnl_acct_list, and we can access it later, then oops will happen.

    Convert atomic_dec_and_test and atomic_inc combinaiton to one atomic
    operation atomic_cmpxchg here to fix this problem.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Pull networking fixes from David Miller:

    1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
    Fietkau.

    2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

    3) Fix some tg3 ethtool logic bugs, and one that would cause no
    interrupts to be generated when rx-coalescing is set to 0. From
    Satish Baddipadige and Siva Reddy Kallam.

    4) QLCNIC mailbox corruption and napi budget handling fix from Manish
    Chopra.

    5) Fix fib_trie logic when walking the trie during /proc/net/route
    output than can access a stale node pointer. From David Forster.

    6) Several sctp_diag fixes from Phil Sutter.

    7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

    8) Checksum fixup fixes in bpf from Daniel Borkmann.

    9) Memork leaks in nfnetlink, from Liping Zhang.

    10) Use after free in rxrpc, from David Howells.

    11) Use after free in new skb_array code of macvtap driver, from Jason
    Wang.

    12) Calipso resource leak, from Colin Ian King.

    13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

    14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

    15) Fix lockdep splats in macsec, from Sabrina Dubroca.

    16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
    handling.

    17) Various tc-action bug fixes, from CONG Wang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
    net_sched: allow flushing tc police actions
    net_sched: unify the init logic for act_police
    net_sched: convert tcf_exts from list to pointer array
    net_sched: move tc offload macros to pkt_cls.h
    net_sched: fix a typo in tc_for_each_action()
    net_sched: remove an unnecessary list_del()
    net_sched: remove the leftover cleanup_a()
    mlxsw: spectrum: Allow packets to be trapped from any PG
    mlxsw: spectrum: Unmap 802.1Q FID before destroying it
    mlxsw: spectrum: Add missing rollbacks in error path
    mlxsw: reg: Fix missing op field fill-up
    mlxsw: spectrum: Trap loop-backed packets
    mlxsw: spectrum: Add missing packet traps
    mlxsw: spectrum: Mark port as active before registering it
    mlxsw: spectrum: Create PVID vPort before registering netdevice
    mlxsw: spectrum: Remove redundant errors from the code
    mlxsw: spectrum: Don't return upon error in removal path
    i40e: check for and deal with non-contiguous TCs
    ixgbe: Re-enable ability to toggle VLAN filtering
    ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
    ...

    Linus Torvalds
     
  • The act_police uses its own code to walk the
    action hashtable, which leads to that we could
    not flush standalone tc police actions, so just
    switch to tcf_generic_walker() like other actions.

    (Joint work from Roman and Cong.)

    Signed-off-by: Roman Mashak
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Roman Mashak
     
  • Jamal reported a crash when we create a police action
    with a specific index, this is because the init logic
    is not correct, we should always create one for this
    case. Just unify the logic with other tc actions.

    Fixes: a03e6fe56971 ("act_police: fix a crash during removal")
    Reported-by: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • As pointed out by Jamal, an action could be shared by
    multiple filters, so we can't use list to chain them
    any more after we get rid of the original tc_action.
    Instead, we could just save pointers to these actions
    in tcf_exts, since they are refcount'ed, so convert
    the list to an array of pointers.

    The "ugly" part is the action API still accepts list
    as a parameter, I just introduce a helper function to
    convert the array of pointers to a list, instead of
    relying on the C99 feature to iterate the array.

    Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
    Reported-by: Jamal Hadi Salim
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • This list_del() for tc action is not needed actually,
    because we only use this list to chain bulk operations,
    therefore should not be carried for latter operations.

    Fixes: ec0595cc4495 ("net_sched: get rid of struct tcf_common")
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • After refactoring tc_action into tcf_common, we no
    longer need to cleanup temporary "actions" in list,
    they are permanently stored in the hashtable.

    Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
    Reported-by: Jamal Hadi Salim
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • inet_lookup_listener() and inet6_lookup_listener() no longer
    take a reference on the found listener.

    This minimal patch adds back the refcounting, but we might do
    this differently in net-next later.

    Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
    Reported-and-tested-by: Denys Fedoryshchenko
    Signed-off-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     
  • We should report the over quota message to the right net namespace
    instead of the init netns.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

17 Aug, 2016

3 commits


16 Aug, 2016

3 commits

  • tipc_msg_create() can return a NULL skb and if so, we shouldn't try to
    call tipc_node_xmit_skb() on it.

    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ #19
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000
    RIP: 0010:[] [] tipc_node_xmit_skb+0x6b/0x140
    RSP: 0018:ffff8800595bfce8 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0
    RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580
    RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000
    R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f
    R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000
    FS: 00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0
    DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
    Stack:
    0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208
    ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018
    0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611
    Call Trace:
    [] tipc_shutdown+0x553/0x880
    [] SyS_shutdown+0x14b/0x170
    [] do_syscall_64+0x19c/0x410
    [] entry_SYSCALL64_slow_path+0x25/0x25
    Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00
    RIP [] tipc_node_xmit_skb+0x6b/0x140
    RSP
    ---[ end trace 57b0484e351e71f1 ]---

    I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure
    userspace is equipped to handle that. Anyway, this is better than a GPF
    and looks somewhat consistent with other tipc_msg_create() callers.

    Signed-off-by: Vegard Nossum
    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Vegard Nossum
     
  • Ensure that the inner_protocol is set on transmit so that GSO segmentation,
    which relies on that field, works correctly.

    This is achieved by setting the inner_protocol in gre_build_header rather
    than each caller of that function. It ensures that the inner_protocol is
    set when gre_fb_xmit() is used to transmit GRE which was not previously the
    case.

    I have observed this is not the case when OvS transmits GRE using
    lwtunnel metadata (which it always does).

    Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol")
    Cc: Pravin Shelar
    Acked-by: Alexander Duyck
    Signed-off-by: Simon Horman
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Simon Horman
     
  • ping_v6_sendmsg does not set flowi6_oif in response to
    sin6_scope_id or sk_bound_dev_if, so it is not possible to use
    these APIs to ping an IPv6 address on a different interface.
    Instead, it sets flowi6_iif, which is incorrect but harmless.

    Stop setting flowi6_iif, and support various ways of setting oif
    in the same priority order used by udpv6_sendmsg.

    Tested: https://android-review.googlesource.com/#/c/254470/
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     

15 Aug, 2016

1 commit

  • Remove unnecessary use of enable/disable callback notifications
    and the incorrect more space available check.

    The virtio_transport_tx_work handles when the TX virtqueue
    has more buffers available.

    Signed-off-by: Gerard Garcia
    Acked-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Gerard Garcia
     

14 Aug, 2016

1 commit