20 Sep, 2016

1 commit

  • When a socket is cloned, the associated sock_cgroup_data is duplicated
    but not its reference on the cgroup. As a result, the cgroup reference
    count will underflow when both sockets are destroyed later on.

    Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
    Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Cc: [4.5+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

17 Sep, 2016

1 commit


13 Sep, 2016

2 commits

  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Stable patches:
    - We must serialise LAYOUTGET and LAYOUTRETURN to ensure correct
    state accounting
    - Fix the CREATE_SESSION slot number

    Bugfixes:
    - sunrpc: fix a UDP memory accounting regression
    - NFS: Fix an error reporting regression in nfs_file_write()
    - pNFS: Fix further layout stateid issues
    - RPC/rdma: Revert 3d4cf35bd4fa ("xprtrdma: Reply buffer
    exhaustion...")
    - RPC/rdma: Fix receive buffer accounting"

    * tag 'nfs-for-4.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4.1: Fix the CREATE_SESSION slot number accounting
    xprtrdma: Fix receive buffer accounting
    xprtrdma: Revert 3d4cf35bd4fa ("xprtrdma: Reply buffer exhaustion...")
    pNFS: Don't forget the layout stateid if there are outstanding LAYOUTGETs
    pNFS: Clear out all layout segments if the server unsets lrp->res.lrs_present
    pNFS: Fix pnfs_set_layout_stateid() to clear NFS_LAYOUT_INVALID_STID
    pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised
    NFS: Fix error reporting in nfs_file_write()
    sunrpc: fix UDP memory accounting

    Linus Torvalds
     
  • rsc_lookup steals the passed-in memory to avoid doing an allocation of
    its own, so we can't just pass in a pointer to memory that someone else
    is using.

    If we really want to avoid allocation there then maybe we should
    preallocate somwhere, or reference count these handles.

    For now we should revert.

    On occasion I see this on my server:

    kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
    kernel: invalid opcode: 0000 [#1] SMP
    kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
    kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b #15
    kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
    kernel: Workqueue: events do_cache_clean [sunrpc]
    kernel: task: ffff8808541d8000 task.stack: ffff880854344000
    kernel: RIP: 0010:[] [] kfree+0x155/0x180
    kernel: RSP: 0018:ffff880854347d70 EFLAGS: 00010246
    kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
    kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
    kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
    kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
    kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
    kernel: FS: 0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
    kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
    kernel: Stack:
    kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
    kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
    kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
    kernel: Call Trace:
    kernel: [] rsc_free+0x16/0x90 [auth_rpcgss]
    kernel: [] rsc_put+0x16/0x30 [auth_rpcgss]
    kernel: [] cache_clean+0x2e0/0x300 [sunrpc]
    kernel: [] do_cache_clean+0xe/0x70 [sunrpc]
    kernel: [] process_one_work+0x1ff/0x3b0
    kernel: [] worker_thread+0x2bc/0x4a0
    kernel: [] ? rescuer_thread+0x3a0/0x3a0
    kernel: [] kthread+0xe4/0xf0
    kernel: [] ret_from_fork+0x1f/0x40
    kernel: [] ? kthread_stop+0x110/0x110
    kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
    kernel: RIP [] kfree+0x155/0x180
    kernel: RSP
    kernel: ---[ end trace 3fdec044969def26 ]---

    It seems to be most common after a server reboot where a client has been
    using a Kerberos mount, and reconnects to continue its workload.

    Signed-off-by: Chuck Lever
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

12 Sep, 2016

1 commit

  • Pull networking fixes from David Miller:
    "Mostly small sets of driver fixes scattered all over the place.

    1) Mediatek driver fixes from Sean Wang. Forward port not written
    correctly during TX map, missed handling of EPROBE_DEFER, and
    mistaken use of put_page() instead of skb_free_frag().

    2) Fix socket double-free in KCM code, from WANG Cong.

    3) QED driver fixes from Sudarsana Reddy Kalluru, including a fix for
    using the dcbx buffers before initializing them.

    4) Mellanox Switch driver fixes from Jiri Pirko, including a fix for
    double fib removals and an error handling fix in
    mlxsw_sp_module_init().

    5) Fix kernel panic when enabling LLDP in i40e driver, from Dave
    Ertman.

    6) Fix padding of TSO packets in thunderx driver, from Sunil Goutham.

    7) TCP's rcv_wup not initialized properly when using fastopen, from
    Neal Cardwell.

    8) Don't use uninitialized flow keys in flow dissector, from Gao
    Feng.

    9) Use after free in l2tp module unload, from Sabrina Dubroca.

    10) Fix interrupt registry ordering issues in smsc911x driver, from
    Jeremy Linton.

    11) Fix crashes in bonding having to do with enslaving and rx_handler,
    from Mahesh Bandewar.

    12) AF_UNIX deadlock fixes from Linus.

    13) In mlx5 driver, don't read skb->xmit_mode after it might have been
    freed from the TX reclaim path. From Tariq Toukan.

    14) Fix a bug from 2015 in TCP Yeah where the congestion window does
    not increase, from Artem Germanov.

    15) Don't pad frames on receive in NFP driver, from Jakub Kicinski.

    16) Fix chunk fragmenting in SCTP wrt. GSO, from Marcelo Ricardo
    Leitner.

    17) Fix deletion of VRF routes, from Mark Tomlinson.

    18) Fix device refcount leak when DAD fails in ipv6, from Wei Yongjun"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
    net/mlx4_en: Fix panic on xmit while port is down
    net/mlx4_en: Fixes for DCBX
    net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()
    net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()
    net: ethernet: renesas: sh_eth: add POST registers for rz
    drivers: net: phy: mdio-xgene: Add hardware dependency
    dwc_eth_qos: do not register semi-initialized device
    sctp: identify chunks that need to be fragmented at IP level
    mlxsw: spectrum: Set port type before setting its address
    mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init
    nfp: don't pad frames on receive
    nfp: drop support for old firmware ABIs
    nfp: remove linux/version.h includes
    tcp: cwnd does not increase in TCP YeAH
    net/mlx5e: Fix parsing of vlan packets when updating lro header
    net/mlx5e: Fix global PFC counters replication
    net/mlx5e: Prevent casting overflow
    net/mlx5e: Move an_disable_cap bit to a new position
    net/mlx5e: Fix xmit_more counter race issue
    tcp: fastopen: avoid negative sk_forward_alloc
    ...

    Linus Torvalds
     

10 Sep, 2016

1 commit

  • Previously, without GSO, it was easy to identify it: if the chunk didn't
    fit and there was no data chunk in the packet yet, we could fragment at
    IP level. So if there was an auth chunk and we were bundling a big data
    chunk, it would fragment regardless of the size of the auth chunk. This
    also works for the context of PMTU reductions.

    But with GSO, we cannot distinguish such PMTU events anymore, as the
    packet is allowed to exceed PMTU.

    So we need another check: to ensure that the chunk that we are adding,
    actually fits the current PMTU. If it doesn't, trigger a flush and let
    it be fragmented at IP level in the next round.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

09 Sep, 2016

3 commits

  • Commit 76174004a0f19785a328f40388e87e982bbf69b9
    (tcp: do not slow start when cwnd equals ssthresh )
    introduced regression in TCP YeAH. Using 100ms delay 1% loss virtual
    ethernet link kernel 4.2 shows bandwidth ~500KB/s for single TCP
    connection and kernel 4.3 and above (including 4.8-rc4) shows bandwidth
    ~100KB/s.
    That is caused by stalled cwnd when cwnd equals ssthresh. This patch
    fixes it by proper increasing cwnd in this case.

    Signed-off-by: Artem Germanov
    Acked-by: Dmitry Adamushko
    Signed-off-by: David S. Miller

    Artem Germanov
     
  • When DATA and/or FIN are carried in a SYN/ACK message or SYN message,
    we append an skb in socket receive queue, but we forget to call
    sk_forced_mem_schedule().

    Effect is that the socket has a negative sk->sk_forward_alloc as long as
    the message is not read by the application.

    Josh Hunt fixed a similar issue in commit d22e15371811 ("tcp: fix tcp
    fin memory accounting")

    Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
    Signed-off-by: Eric Dumazet
    Reviewed-by: Josh Hunt
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Steffen Klassert says:

    ====================
    ipsec 2016-09-08

    1) Fix a crash when xfrm_dump_sa returns an error.
    From Vegard Nossum.

    2) Remove some incorrect WARN() on normal error handling.
    From Vegard Nossum.

    3) Ignore socket policies when rebuilding hash tables,
    socket policies are not inserted into the hash tables.
    From Tobias Brunner.

    4) Initialize and check tunnel pointers properly before
    we use it. From Alexey Kodanev.

    5) Fix l3mdev oif setting on xfrm dst lookups.
    From David Ahern.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Sep, 2016

5 commits

  • In general, when DAD detected IPv6 duplicate address, ifp->state
    will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
    delayed work, the call tree should be like this:

    ndisc_recv_ns
    -> addrconf_dad_failure addrconf_mod_dad_work
    -> schedule addrconf_dad_work()
    -> addrconf_dad_stop()
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • When deleting an IP address from an interface, there is a clean-up of
    routes which refer to this local address. However, there was no check to
    see that the VRF matched. This meant that deletion wasn't confined to
    the VRF it should have been.

    To solve this, a new field has been added to fib_info to hold a table
    id. When removing fib entries corresponding to a local ip address, this
    table id is also used in the comparison.

    The table id is populated when the fib_info is created. This was already
    done in some places, but not in ip_rt_ioctl(). This has now been fixed.

    Fixes: 021dd3b8a142 ("net: Add routes to the table associated with the device")
    Acked-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: Mark Tomlinson
    Signed-off-by: David S. Miller

    Mark Tomlinson
     
  • An RPC can terminate before its reply arrives, if a credential
    problem or a soft timeout occurs. After this happens, xprtrdma
    reports it is out of Receive buffers.

    A Receive buffer is posted before each RPC is sent, and returned to
    the buffer pool when a reply is received. If no reply is received
    for an RPC, that Receive buffer remains posted. But xprtrdma tries
    to post another when the next RPC is sent.

    If this happens a few dozen times, there are no receive buffers left
    to be posted at send time. I don't see a way for a transport
    connection to recover at that point, and it will spit warnings and
    unnecessarily delay RPCs on occasion for its remaining lifetime.

    Commit 1e465fd4ff47 ("xprtrdma: Replace send and receive arrays")
    removed a little bit of logic to detect this case and not provide
    a Receive buffer so no more buffers are posted, and then transport
    operation continues correctly. We didn't understand what that logic
    did, and it wasn't commented, so it was removed as part of the
    overhaul to support backchannel requests.

    Restore it, but be wary of the need to keep extra Receives posted
    to deal with backchannel requests.

    Fixes: 1e465fd4ff47 ("xprtrdma: Replace send and receive arrays")
    Signed-off-by: Chuck Lever
    Reviewed-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Receive buffer exhaustion, if it were to actually occur, would be
    catastrophic. However, when there are no reply buffers to post, that
    means all of them have already been posted and are waiting for
    incoming replies. By design, there can never be more RPCs in flight
    than there are available receive buffers.

    A receive buffer can be left posted after an RPC exits without a
    received reply; say, due to a credential problem or a soft timeout.
    This does not result in fewer posted receive buffers than there are
    pending RPCs, and there is already logic in xprtrdma to deal
    appropriately with this case.

    It also looks like the "+ 2" that was removed was accidentally
    accommodating the number of extra receive buffers needed for
    receiving backchannel requests. That will need to be addressed by
    another patch.

    Fixes: 3d4cf35bd4fa ("xprtrdma: Reply buffer exhaustion can be...")
    Signed-off-by: Chuck Lever
    Reviewed-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Neither the failure or success paths of ping_v6_sendmsg release
    the dst it acquires. This leads to a flood of warnings from
    "net/core/dst.c:288 dst_release" on older kernels that
    don't have 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 backported.

    That patch optimistically hoped this had been fixed post 3.10, but
    it seems at least one case wasn't, where I've seen this triggered
    a lot from machines doing unprivileged icmp sockets.

    Cc: Martin Lau
    Signed-off-by: Dave Jones
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Dave Jones
     

05 Sep, 2016

3 commits

  • Right now we use the 'readlock' both for protecting some of the af_unix
    IO path and for making the bind be single-threaded.

    The two are independent, but using the same lock makes for a nasty
    deadlock due to ordering with regards to filesystem locking. The bind
    locking would want to nest outside the VSF pathname locking, but the IO
    locking wants to nest inside some of those same locks.

    We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix
    splice-bind deadlock") which moved the readlock inside the vfs locks,
    but that caused problems with overlayfs that will then call back into
    filesystem routines that take the lock in the wrong order anyway.

    Splitting the locks means that we can go back to having the bind lock be
    the outermost lock, and we don't have any deadlocks with lock ordering.

    Acked-by: Rainer Weikusat
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Linus Torvalds
     
  • This reverts commit c845acb324aa85a39650a14e7696982ceea75dc1.

    It turns out that it just replaces one deadlock with another one: we can
    still get the wrong lock ordering with the readlock due to overlayfs
    calling back into the filesystem layer and still taking the vfs locks
    after the readlock.

    The proper solution ends up being to just split the readlock into two
    pieces: the bind lock (taken *outside* the vfs locks) and the IO lock
    (taken *inside* the filesystem locks). The two locks are independent
    anyway.

    Signed-off-by: Linus Torvalds
    Reviewed-by: Shmulik Ladkani
    Signed-off-by: David S. Miller

    Linus Torvalds
     
  • Following few steps will crash kernel -

    (a) Create bonding master
    > modprobe bonding miimon=50
    (b) Create macvlan bridge on eth2
    > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
    type macvlan
    (c) Now try adding eth2 into the bond
    > echo +eth2 > /sys/class/net/bond0/bonding/slaves

    Bonding does lots of things before checking if the device enslaved is
    busy or not.

    In this case when the notifier call-chain sends notifications, the
    bond_netdev_event() assumes that the rx_handler /rx_handler_data is
    registered while the bond_enslave() hasn't progressed far enough to
    register rx_handler for the new slave.

    This patch adds a rx_handler check that can be performed right at the
    beginning of the enslave code to avoid getting into this situation.

    Signed-off-by: Mahesh Bandewar
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

03 Sep, 2016

2 commits

  • The commit f9b2ee714c5c ("SUNRPC: Move UDP receive data path
    into a workqueue context"), as a side effect, moved the
    skb_free_datagram() call outside the scope of the related socket
    lock, but UDP sockets require such lock to be held for proper
    memory accounting.
    Fix it by replacing skb_free_datagram() with
    skb_free_datagram_locked().

    Fixes: f9b2ee714c5c ("SUNRPC: Move UDP receive data path into a workqueue context")
    Reported-and-tested-by: Jan Stancek
    Signed-off-by: Paolo Abeni
    Cc: stable@vger.kernel.org # 4.4+
    Signed-off-by: Trond Myklebust

    Paolo Abeni
     
  • Tunnel deletion is delayed by both a workqueue (l2tp_tunnel_delete -> wq
    -> l2tp_tunnel_del_work) and RCU (sk_destruct -> RCU ->
    l2tp_tunnel_destruct).

    By the time l2tp_tunnel_destruct() runs to destroy the tunnel and finish
    destroying the socket, the private data reserved via the net_generic
    mechanism has already been freed, but l2tp_tunnel_destruct() actually
    uses this data.

    Make sure tunnel deletion for the netns has completed before returning
    from l2tp_exit_net() by first flushing the tunnel removal workqueue, and
    then waiting for RCU callbacks to complete.

    Fixes: 167eb17e0b17 ("l2tp: create tunnel sockets in the right namespace")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

02 Sep, 2016

7 commits

  • Commit 8eb30be0352d0916 ("ipv6: Create ip6_tnl_xmit") unsets
    flowi6_proto in ip4ip6_tnl_xmit() and ip6ip6_tnl_xmit().
    Since xfrm_selector_match() relies on this info, IPv6 packets
    sent by an ip6tunnel cannot be properly selected by their
    protocols after removing it. This patch puts flowi6_proto back.

    Cc: stable@vger.kernel.org
    Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit")
    Signed-off-by: Eli Cooper
    Signed-off-by: David S. Miller

    Eli Cooper
     
  • The original codes depend on that the function parameters are evaluated from
    left to right. But the parameter's evaluation order is not defined in C
    standard actually.

    When flow_keys_have_l4(&keys) is invoked before ___skb_get_hash(skb, &keys,
    hashrnd) with some compilers or environment, the keys passed to
    flow_keys_have_l4 is not initialized.

    Fixes: 6db61d79c1e1 ("flow_dissector: Ignore flow dissector return value from ___skb_get_hash")

    Acked-by: Eric Dumazet
    Signed-off-by: Gao Feng
    Signed-off-by: David S. Miller

    Gao Feng
     
  • Yuchung noticed that on the first TFO server data packet sent after
    the (TFO) handshake, the server echoed the TCP timestamp value in the
    SYN/data instead of the timestamp value in the final ACK of the
    handshake. This problem did not happen on regular opens.

    The tcp_replace_ts_recent() logic that decides whether to remember an
    incoming TS value needs tp->rcv_wup to hold the latest receive
    sequence number that we have ACKed (latest tp->rcv_nxt we have
    ACKed). This commit fixes this issue by ensuring that a TFO server
    properly updates tp->rcv_wup to match tp->rcv_nxt at the time it sends
    a SYN/ACK for the SYN/data.

    Reported-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • pskb_may_pull may fail due to various reasons (e.g. alloc failure), but the
    skb isn't changed/dropped and processing continues so we shouldn't
    increment tx_dropped.

    CC: Kyeyoon Park
    CC: Roopa Prabhu
    CC: Stephen Hemminger
    CC: bridge@lists.linux-foundation.org
    Fixes: 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • All changes are notified, but the initial state was missing.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • The 'default' value was not advertised.

    Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status")
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • In a dual bearer configuration, if the second tipc link becomes
    active while the first link still has pending nametable "bulk"
    updates, it randomly leads to reset of the second link.

    When a link is established, the function named_distribute(),
    fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL)
    with NAME_DISTRIBUTOR message for each PUBLICATION.
    However, the function named_distribute() allocates the buffer by
    increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR).
    This consumes the space allocated for TUNNEL_PROTOCOL.

    When establishing the second link, the link shall tunnel all the
    messages in the first link queue including the "bulk" update.
    As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds
    the link mtu the transmission fails (-EMSGSIZE).

    Thus, the synch point based on the message count of the tunnel
    packets is never reached leading to link timeout.

    In this commit, we adjust the size of name distributor message so that
    they can be tunnelled.

    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     

01 Sep, 2016

2 commits

  • Dmitry reported a double free on kcm socket, which could
    be easily reproduced by:

    #include
    #include

    int main()
    {
    int fd = syscall(SYS_socket, 0x29ul, 0x5ul, 0x0ul, 0, 0, 0);
    syscall(SYS_ioctl, fd, 0x89e2ul, 0x20a98000ul, 0, 0, 0);
    return 0;
    }

    This is because on the error path, after we install
    the new socket file, we call sock_release() to clean
    up the socket, which leaves the fd pointing to a freed
    socket. Fix this by calling sys_close() on that fd
    directly.

    Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
    Reported-by: Dmitry Vyukov
    Cc: Tom Herbert
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • commit bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with
    INCLUDE and no sources as a leave") seems to have accidentally reverted
    commit 47cc84ce0c2f ("bridge: fix parsing of MLDv2 reports"). This
    commit brings back a change to br_ip6_multicast_mld2_report() where
    parsing of MLDv2 reports stops when the first group is successfully
    added to the MDB cache.

    Fixes: bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave")
    Signed-off-by: Davide Caratti
    Acked-by: Nikolay Aleksandrov
    Acked-by: Thadeu Lima de Souza Cascardo
    Signed-off-by: David S. Miller

    Davide Caratti
     

31 Aug, 2016

3 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for your net tree,
    they are:

    1) Allow nf_tables reject expression from input, forward and output hooks,
    since only there the routing information is available, otherwise we crash.

    2) Fix unsafe list iteration when flushing timeout and accouting objects.

    3) Fix refcount leak on timeout policy parsing failure.

    4) Unlink timeout object for unconfirmed conntracks too

    5) Missing validation of pkttype mangling from bridge family.

    6) Fix refcount leak on ebtables on second lookup for the specific
    bridge match extension, this patch from Sabrina Dubroca.

    7) Remove unnecessary ip_hdr() in nf_tables_netdev family.

    Patches from 1-5 and 7 from Liping Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    Three little fixes:
    * revert a recent wext patch, which Ben Hutchings noticed was
    wrong, and it turns out not to be necessary for any driver

    * fix an infinite loop that can occur under certain conditions
    in mac80211's TDLS code (depending on regulatory information)

    * add a cfg80211_get_station() static inline when cfg80211 isn't
    built, to allow other modules to not have to depend on it for it
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Stable patches:
    - Fix a refcount leak in nfs_callback_up_net
    - Fix an Oopsable condition when the flexfile pNFS driver connection
    to the DS fails
    - Fix an Oopsable condition in NFSv4.1 server callback races
    - Ensure pNFS clients stop doing I/O to the DS if their lease has
    expired, as required by the NFSv4.1 protocol

    Bugfixes:
    - Fix potential looping in the NFSv4.x migration code
    - Patch series to close callback races for OPEN, LAYOUTGET and
    LAYOUTRETURN
    - Silence WARN_ON when NFSv4.1 over RDMA is in use
    - Fix a LAYOUTCOMMIT race in the pNFS/blocks client
    - Fix pNFS timeout issues when the DS fails"

    * tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4.x: Fix a refcount leak in nfs_callback_up_net
    NFS4: Avoid migration loops
    pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails
    NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence
    NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN
    NFSv4.1: Defer bumping the slot sequence number until we free the slot
    NFSv4.1: Delay callback processing when there are referring triples
    NFSv4.1: Fix Oopsable condition in server callback races
    SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use
    pnfs/blocklayout: update last_write_offset atomically with extents
    pNFS: The client must not do I/O to the DS if it's lease has expired
    pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls
    pNFS/flexfiles: Set reasonable default retrans values for the data channel
    NFS: Allow the mount option retrans=0
    pNFS/flexfiles: Fix layoutstat periodic reporting

    Linus Torvalds
     

30 Aug, 2016

2 commits


27 Aug, 2016

1 commit


26 Aug, 2016

4 commits


25 Aug, 2016

2 commits

  • commit bcf493428840 ("netfilter: ebtables: Fix extension lookup with
    identical name") added a second lookup in case the extension that was
    found during the first lookup matched another extension with the same
    name, but didn't release the reference on the incorrect module.

    Fixes: bcf493428840 ("netfilter: ebtables: Fix extension lookup with identical name")
    Signed-off-by: Sabrina Dubroca
    Acked-by: Phil Sutter
    Signed-off-by: Pablo Neira Ayuso

    Sabrina Dubroca
     
  • "meta pkttype set" is only supported on prerouting chain with bridge
    family and ingress chain with netdev family.

    But the validate check is incomplete, and the user can add the nft
    rules on input chain with bridge family, for example:
    # nft add table bridge filter
    # nft add chain bridge filter input {type filter hook input \
    priority 0 \;}
    # nft add chain bridge filter test
    # nft add rule bridge filter test meta pkttype set unicast
    # nft add rule bridge filter input jump test

    This patch fixes the problem.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang