31 Aug, 2019

1 commit


28 Aug, 2019

3 commits

  • In set_secret(), key->tfm is assigned to NULL on line 55, and then
    ceph_crypto_key_destroy(key) is executed.

    ceph_crypto_key_destroy(key)
    crypto_free_sync_skcipher(key->tfm)
    crypto_free_skcipher(&tfm->base);

    This happens to work because crypto_sync_skcipher is a trivial wrapper
    around crypto_skcipher: &tfm->base is still 0 and crypto_free_skcipher()
    handles that. Let's not rely on the layout of crypto_sync_skcipher.

    This bug is found by a static analysis tool STCheck written by us.

    Fixes: 69d6302b65a8 ("libceph: Remove VLA usage of skcipher").
    Signed-off-by: Jia-Ju Bai
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Jia-Ju Bai
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Stable fixes:

    - Fix a page lock leak in nfs_pageio_resend()

    - Ensure O_DIRECT reports an error if the bytes read/written is 0

    - Don't handle errors if the bind/connect succeeded

    - Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was
    invalidat ed"

    Bugfixes:

    - Don't refresh attributes with mounted-on-file information

    - Fix return values for nfs4_file_open() and nfs_finish_open()

    - Fix pnfs layoutstats reporting of I/O errors

    - Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort
    for soft I/O errors when the user specifies a hard mount.

    - Various fixes to the error handling in sunrpc

    - Don't report writepage()/writepages() errors twice"

    * tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFS: remove set but not used variable 'mapping'
    NFSv2: Fix write regression
    NFSv2: Fix eof handling
    NFS: Fix writepage(s) error handling to not report errors twice
    NFS: Fix spurious EIO read errors
    pNFS/flexfiles: Don't time out requests on hard mounts
    SUNRPC: Handle connection breakages correctly in call_status()
    Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"
    SUNRPC: Handle EADDRINUSE and ENOBUFS correctly
    pNFS/flexfiles: Turn off soft RPC calls
    SUNRPC: Don't handle errors if the bind/connect succeeded
    NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()
    NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup
    NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0
    NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend()
    NFSv4: Fix return value in nfs_finish_open()
    NFSv4: Fix return values for nfs4_file_open()
    NFS: Don't refresh attributes with mounted-on-file information

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya
    Leoshkevich.

    2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for
    SMC from Jason Baron.

    3) ipv6_mc_may_pull() should return 0 for malformed packets, not
    -EINVAL. From Stefano Brivio.

    4) Don't forget to unpin umem xdp pages in error path of
    xdp_umem_reg(). From Ivan Khoronzhuk.

    5) Fix sta object leak in mac80211, from Johannes Berg.

    6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2
    switches. From Florian Fainelli.

    7) Revert DMA sync removal from r8169 which was causing regressions on
    some MIPS Loongson platforms. From Heiner Kallweit.

    8) Use after free in flow dissector, from Jakub Sitnicki.

    9) Fix NULL derefs of net devices during ICMP processing across
    collect_md tunnels, from Hangbin Liu.

    10) proto_register() memory leaks, from Zhang Lin.

    11) Set NLM_F_MULTI flag in multipart netlink messages consistently,
    from John Fastabend.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
    r8152: Set memory to all 0xFFs on failed reg reads
    openvswitch: Fix conntrack cache with timeout
    ipv4: mpls: fix mpls_xmit for iptunnel
    nexthop: Fix nexthop_num_path for blackhole nexthops
    net: rds: add service level support in rds-info
    net: route dump netlink NLM_F_MULTI flag missing
    s390/qeth: reject oversized SNMP requests
    sock: fix potential memory leak in proto_register()
    MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT
    xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode
    ipv4/icmp: fix rt dst dev null pointer dereference
    openvswitch: Fix log message in ovs conntrack
    bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
    bpf: fix use after free in prog symbol exposure
    bpf: fix precision tracking in presence of bpf2bpf calls
    flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
    Revert "r8169: remove not needed call to dma_sync_single_for_device"
    ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
    net/ncsi: Fix the payload copying for the request coming from Netlink
    qed: Add cleanup in qed_slowpath_start()
    ...

    Linus Torvalds
     

27 Aug, 2019

4 commits


26 Aug, 2019

2 commits

  • This patch addresses a conntrack cache issue with timeout policy.
    Currently, we do not check if the timeout extension is set properly in the
    cached conntrack entry. Thus, after packet recirculate from conntrack
    action, the timeout policy is not applied properly. This patch fixes the
    aforementioned issue.

    Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
    Reported-by: kbuild test robot
    Signed-off-by: Yi-Hung Wei
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Yi-Hung Wei
     
  • When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
    same for rt->rt_gw_family. Therefore, when rt->rt_gw_family is checked
    in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
    doesn't work anymore.

    This issue was found with LTP mpls03 tests.

    Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway")
    Signed-off-by: Alexey Kodanev
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Alexey Kodanev
     

25 Aug, 2019

7 commits

  • >From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
    is used to identify different flows within an IBA subnet.
    It is carried in the local route header of the packet.

    Before this commit, run "rds-info -I". The outputs are as
    below:
    "
    RDS IB Connections:
    LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
    192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9
    192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9
    192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
    "
    After this commit, the output is as below:
    "
    RDS IB Connections:
    LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
    192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9
    192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9
    192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
    "

    The commit fe3475af3bdf ("net: rds: add per rds connection cache
    statistics") adds cache_allocs in struct rds_info_rdma_connection
    as below:
    struct rds_info_rdma_connection {
    ...
    __u32 rdma_mr_max;
    __u32 rdma_mr_size;
    __u8 tos;
    __u32 cache_allocs;
    };
    The peer struct in rds-tools of struct rds_info_rdma_connection is as
    below:
    struct rds_info_rdma_connection {
    ...
    uint32_t rdma_mr_max;
    uint32_t rdma_mr_size;
    uint8_t tos;
    uint8_t sl;
    uint32_t cache_allocs;
    };
    The difference between userspace and kernel is the member variable sl.
    In the kernel struct, the member variable sl is missing. This will
    introduce risks. So it is necessary to use this commit to avoid this risk.

    Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics")
    CC: Joe Jin
    CC: JUNXIAO_BI
    Suggested-by: Gerd Rausch
    Signed-off-by: Zhu Yanjun
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Zhu Yanjun
     
  • An excerpt from netlink(7) man page,

    In multipart messages (multiple nlmsghdr headers with associated payload
    in one byte stream) the first and all following headers have the
    NLM_F_MULTI flag set, except for the last header which has the type
    NLMSG_DONE.

    but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a
    FIB dump. The result is user space applications following above man page
    excerpt may get confused and may stop parsing msg believing something went
    wrong.

    In the golang netlink lib [0] the library logic stops parsing believing the
    message is not a multipart message. Found this running Cilium[1] against
    net-next while adding a feature to auto-detect routes. I noticed with
    multiple route tables we no longer could detect the default routes on net
    tree kernels because the library logic was not returning them.

    Fix this by handling the fib_dump_info_fnhe() case the same way the
    fib_dump_info() handles it by passing the flags argument through the
    call chain and adding a flags argument to rt_fill_info().

    Tested with Cilium stack and auto-detection of routes works again. Also
    annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
    NLMSG_DONE flags look correct after this.

    Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
    as is done for fib_dump_info() so this looks correct to me.

    [0] https://github.com/vishvananda/netlink/
    [1] https://github.com/cilium/

    Fixes: ee28906fd7a14 ("ipv4: Dump route exceptions if requested")
    Signed-off-by: John Fastabend
    Reviewed-by: Stefano Brivio
    Signed-off-by: David S. Miller

    John Fastabend
     
  • If protocols registered exceeded PROTO_INUSE_NR, prot will be
    added to proto_list, but no available bit left for prot in
    proto_inuse_idx.

    Changes since v2:
    * Propagate the error code properly

    Signed-off-by: zhanglin
    Signed-off-by: David S. Miller

    zhanglin
     
  • In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
    e,g, with tunnel collect_md mode, which will cause kernel crash.
    Here is what the code path looks like, for GRE:

    - ip6gre_tunnel_xmit
    - ip6gre_xmit_ipv6
    - __gre6_xmit
    - ip6_tnl_xmit
    - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmpv6_send
    - icmpv6_route_lookup
    - xfrm_decode_session_reverse
    - decode_session4
    - oif = skb_dst(skb)->dev->ifindex; dev->ifindex; dev to NULL by default.
    We could not fix it in __metadata_dst_init() as there is no dev supplied.
    On the other hand, the skb_dst(skb)->dev is actually not needed as we
    called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
    used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;

    So make a dst dev check here should be clean and safe.

    v4: No changes.

    v3: No changes.

    v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
    in {ip_md, ip6}_tunnel_xmit.

    Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
    Signed-off-by: Hangbin Liu
    Tested-by: Jonathan Lemon
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
    e,g, with tunnel collect_md mode, which will cause kernel crash.
    Here is what the code path looks like, for GRE:

    - ip6gre_tunnel_xmit
    - ip6gre_xmit_ipv4
    - __gre6_xmit
    - ip6_tnl_xmit
    - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmp_send
    - net = dev_net(rt->dst.dev); dev to NULL by default.
    We could not fix it in __metadata_dst_init() as there is no dev supplied.
    On the other hand, the reason we need rt->dst.dev is to get the net.
    So we can just try get it from skb->dev when rt->dst.dev is NULL.

    v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
    still use dst.dev and do a check to avoid crash.

    v3: No changes.

    v2: fix the issue in __icmp_send() instead of updating shared dst dev
    in {ip_md, ip6}_tunnel_xmit.

    Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
    Signed-off-by: Hangbin Liu
    Reviewed-by: Julian Anastasov
    Acked-by: Jonathan Lemon
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
    Signed-off-by: Yi-Hung Wei
    Signed-off-by: David S. Miller

    Yi-Hung Wei
     
  • …inux/kernel/git/sschmidt/wpan

    Stefan Schmidt says:

    ====================
    pull-request: ieee802154 for net 2019-08-24

    An update from ieee802154 for your *net* tree.

    Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for
    ieee802154 and Colin Ian King cleaned up a redundant variable assignment.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

24 Aug, 2019

5 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2019-08-24

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.

    2) Fix a use-after-free in prog symbol exposure, from Daniel.

    3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.

    4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.

    5) Fix a potential use-after-free on flow dissector detach, from Jakub.

    6) Fix bpftool to close prog fd after showing metadata, from Quentin.

    7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • test_select_reuseport fails on s390 due to verifier rejecting
    test_select_reuseport_kern.o with the following message:

    ; data_check.eth_protocol = reuse_md->eth_protocol;
    18: (69) r1 = *(u16 *)(r6 +22)
    invalid bpf_context access off=22 size=2

    This is because on big-endian machines casts from __u32 to __u16 are
    generated by referencing the respective variable as __u16 with an offset
    of 2 (as opposed to 0 on little-endian machines).

    The verifier already has all the infrastructure in place to allow such
    accesses, it's just that they are not explicitly enabled for
    eth_protocol field. Enable them for eth_protocol field by using
    bpf_ctx_range instead of offsetof.

    Ditto for ip_protocol, bind_inany and len, since they already allow
    narrowing, and the same problem can arise when working with them.

    Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
    Signed-off-by: Ilya Leoshkevich
    Signed-off-by: Daniel Borkmann

    Ilya Leoshkevich
     
  • Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
    free the program once a grace period has elapsed. The callback can run
    together with new RCU readers that started after the last grace period.
    New RCU readers can potentially see the "old" to-be-freed or already-freed
    pointer to the program object before the RCU update-side NULLs it.

    Reorder the operations so that the RCU update-side resets the protected
    pointer before the end of the grace period after which the program will be
    freed.

    Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
    Reported-by: Lorenz Bauer
    Signed-off-by: Jakub Sitnicki
    Acked-by: Petar Penkov
    Signed-off-by: Daniel Borkmann

    Jakub Sitnicki
     
  • Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
    ignoring the specific error value. This results in addrconf_add_dev
    returning ENOBUFS in all cases, which is unfortunate in cases such as:

    # ip link add dummyX type dummy
    # ip link set dummyX mtu 1200 up
    # ip addr add 2000::/64 dev dummyX
    RTNETLINK answers: No buffer space available

    Commit a317a2f19da7 ("ipv6: fail early when creating netdev named all
    or default") introduced error returns in ipv6_add_dev. Before that,
    that function would simply return NULL for all failures.

    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     
  • Pull ceph fixes from Ilya Dryomov:
    "Three important fixes tagged for stable (an indefinite hang, a crash
    on an assert and a NULL pointer dereference) plus a small series from
    Luis fixing instances of vfree() under spinlock"

    * tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client:
    libceph: fix PG split vs OSD (re)connect race
    ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
    ceph: clear page dirty before invalidate page
    ceph: fix buffer free while holding i_ceph_lock in fill_inode()
    ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob()
    ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr()
    libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer

    Linus Torvalds
     

23 Aug, 2019

1 commit


22 Aug, 2019

4 commits

  • We can't rely on ->peer_features in calc_target() because it may be
    called both when the OSD session is established and open and when it's
    not. ->peer_features is not valid unless the OSD session is open. If
    this happens on a PG split (pg_num increase), that could mean we don't
    resend a request that should have been resent, hanging the client
    indefinitely.

    In userspace this was fixed by looking at require_osd_release and
    get_xinfo[osd].features fields of the osdmap. However these fields
    belong to the OSD section of the osdmap, which the kernel doesn't
    decode (only the client section is decoded).

    Instead, let's drop this feature check. It effectively checks for
    luminous, so only pre-luminous OSDs would be affected in that on a PG
    split the kernel might resend a request that should not have been
    resent. Duplicates can occur in other scenarios, so both sides should
    already be prepared for them: see dup/replay logic on the OSD side and
    retry_attempt check on the client side.

    Cc: stable@vger.kernel.org
    Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
    Link: https://tracker.ceph.com/issues/41162
    Reported-by: Jerry Lee
    Signed-off-by: Ilya Dryomov
    Tested-by: Jerry Lee
    Reviewed-by: Jeff Layton

    Ilya Dryomov
     
  • it expects a unsigned int, but got a __be32

    Signed-off-by: Li RongQing
    Signed-off-by: Zhang Yu
    Signed-off-by: David S. Miller

    Li RongQing
     
  • In commit 93a714d6b53d ("multicast: Extend ip address command to enable
    multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN
    to make user able to add multicast address on ethernet interface.

    This works for IPv4, but not for IPv6. See the inet6_addr_add code.

    static int inet6_addr_add()
    {
    ...
    if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
    ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...)
    }

    ifp = ipv6_add_addr(idev, cfg, true, extack); ifa_flags & IFA_F_MCAUTOJOIN) {
    ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...)
    }
    }

    But in ipv6_add_addr() it will check the address type and reject multicast
    address directly. So this feature is never worked for IPv6.

    We should not remove the multicast address check totally in ipv6_add_addr(),
    but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied.

    v2: update commit description

    Fixes: 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on")
    Reported-by: Jianlin Shi
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • Simon Wunderlich says:

    ====================
    Here is a batman-adv bugfix:

    - fix uninit-value in batadv_netlink_get_ifindex(), by Eric Dumazet
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Aug, 2019

6 commits

  • This reverts commit 96cce12ff6e0 ("cfg80211: fix processing world
    regdomain when non modular").

    Re-triggering a reg_process_hint with the last request on all events,
    can make the regulatory domain fail in case of multiple WiFi modules. On
    slower boards (espacially with mdev), enumeration of the WiFi modules
    can end up in an intersected regulatory domain, and user cannot set it
    with 'iw reg set' anymore.

    This is happening, because:
    - 1st module enumerates, queues up a regulatory request
    - request gets processed by __reg_process_hint_driver():
    - checks if previous was set by CORE -> yes
    - checks if regulator domain changed -> yes, from '00' to e.g. 'US'
    -> sends request to the 'crda'
    - 2nd module enumerates, queues up a regulator request (which triggers
    the reg_todo() work)
    - reg_todo() -> reg_process_pending_hints() sees, that the last request
    is not processed yet, so it tries to process it again.
    __reg_process_hint driver() will run again, and:
    - checks if the last request's initiator was the core -> no, it was
    the driver (1st WiFi module)
    - checks, if the previous initiator was the driver -> yes
    - checks if the regulator domain changed -> yes, it was '00' (set by
    core, and crda call did not return yet), and should be changed to 'US'

    ------> __reg_process_hint_driver calls an intersect

    Besides, the reg_process_hint call with the last request is meaningless
    since the crda call has a timeout work. If that timeout expires, the
    first module's request will lost.

    Cc: stable@vger.kernel.org
    Fixes: 96cce12ff6e0 ("cfg80211: fix processing world regdomain when non modular")
    Signed-off-by: Robert Hodaszi
    Link: https://lore.kernel.org/r/20190614131600.GA13897@a1-hr
    Signed-off-by: Johannes Berg

    Hodaszi, Robert
     
  • Fix two shortcomings in the Extended Key ID API:

    1) Allow the userspace to install pairwise keys using keyid 1 without
    NL80211_KEY_NO_TX set. This allows the userspace to install and
    activate pairwise keys with keyid 1 in the same way as for keyid 0,
    simplifying the API usage for e.g. FILS and FT key installs.

    2) IEEE 802.11 - 2016 restricts Extended Key ID usage to CCMP/GCMP
    ciphers in IEEE 802.11 - 2016 "9.4.2.25.4 RSN capabilities".
    Enforce that when installing a key.

    Cc: stable@vger.kernel.org # 5.2
    Fixes: 6cdd3979a2bd ("nl80211/cfg80211: Extended Key ID support")
    Signed-off-by: Alexander Wetzel
    Link: https://lore.kernel.org/r/20190805123400.51567-1-alexander@wetzel-home.de
    Signed-off-by: Johannes Berg

    Alexander Wetzel
     
  • If TDLS station addition is rejected, the sta memory is leaked.
    Avoid this by moving the check before the allocation.

    Cc: stable@vger.kernel.org
    Fixes: 7ed5285396c2 ("mac80211: don't initiate TDLS connection if station is not associated to AP")
    Link: https://lore.kernel.org/r/20190801073033.7892-1-johannes@sipsolutions.net
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • in ip_mc_inc_group, memory allocation flag, not mcast mode, is expected
    by __ip_mc_inc_group

    similar issue in __ip_mc_join_group, both mcase mode and gfp_t are needed
    here, so use ____ip_mc_inc_group(...)

    Fixes: 9fb20801dab4 ("net: Fix ip_mc_{dec,inc}_group allocation context")
    Signed-off-by: Li RongQing
    Signed-off-by: Florian Fainelli
    Signed-off-by: Zhang Yu
    Signed-off-by: David S. Miller

    Li RongQing
     
  • The NCSI spec indicates that if the data does not end on a 32 bit
    boundary, one to three padding bytes equal to 0x00 shall be present to
    align the checksum field to a 32-bit boundary.

    Signed-off-by: Terry S. Duncan
    Signed-off-by: David S. Miller

    Terry S. Duncan
     
  • Currently, we are only explicitly setting SOCK_NOSPACE on a write timeout
    for non-blocking sockets. Epoll() edge-trigger mode relies on SOCK_NOSPACE
    being set when -EAGAIN is returned to ensure that EPOLLOUT is raised.
    Expand the setting of SOCK_NOSPACE to non-blocking sockets as well that can
    use SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior
    that Eric Dumazet introduced for tcp sockets.

    Signed-off-by: Jason Baron
    Cc: Eric Dumazet
    Cc: Ursula Braun
    Cc: Karsten Graul
    Signed-off-by: David S. Miller

    Jason Baron
     

20 Aug, 2019

3 commits

  • Fix mem leak caused by missed unpin routine for umem pages.

    Fixes: 8aef7340ae9695 ("xsk: introduce xdp_umem_page")
    Signed-off-by: Ivan Khoronzhuk
    Acked-by: Jonathan Lemon
    Signed-off-by: Daniel Borkmann

    Ivan Khoronzhuk
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Remove IP MASQUERADING record in MAINTAINERS file,
    from Denis Efremov.

    2) Counter arguments are swapped in ebtables, from
    Todd Seidelmann.

    3) Missing netlink attribute validation in flow_offload
    extension.

    4) Incorrect alignment in xt_nfacct that breaks 32-bits
    userspace / 64-bits kernels, from Juliana Rodrigueiro.

    5) Missing include guard in nf_conntrack_h323_types.h,
    from Masahiro Yamada.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
    under memory pressure"), it is crucial we properly set SOCK_NOSPACE
    when needed.

    However, Jason patch had a bug, because the 'nonblocking' status
    as far as sk_stream_wait_memory() is concerned is governed
    by MSG_DONTWAIT flag passed at sendmsg() time :

    long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);

    So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
    and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
    cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
    value.

    This patch removes the 'noblock' variable since we must always
    set SOCK_NOSPACE if -EAGAIN is returned.

    It also renames the do_nonblock label since we might reach this
    code path even if we were in blocking mode.

    Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
    Signed-off-by: Eric Dumazet
    Cc: Jason Baron
    Reported-by: Vladimir Rutsky
    Acked-by: Soheil Hassas Yeganeh
    Acked-by: Neal Cardwell
    Acked-by: Jason Baron
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Aug, 2019

4 commits

  • When running a 64-bit kernel with a 32-bit iptables binary, the size of
    the xt_nfacct_match_info struct diverges.

    kernel: sizeof(struct xt_nfacct_match_info) : 40
    iptables: sizeof(struct xt_nfacct_match_info)) : 36

    Trying to append nfacct related rules results in an unhelpful message.
    Although it is suggested to look for more information in dmesg, nothing
    can be found there.

    # iptables -A -m nfacct --nfacct-name
    iptables: Invalid argument. Run `dmesg' for more information.

    This patch fixes the memory misalignment by enforcing 8-byte alignment
    within the struct's first revision. This solution is often used in many
    other uapi netfilter headers.

    Signed-off-by: Juliana Rodrigueiro
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Juliana Rodrigueiro
     
  • The netlink attribute policy for NFTA_FLOW_TABLE_NAME is missing.

    Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • The ordering of arguments to the x_tables ADD_COUNTER macro
    appears to be wrong in ebtables (cf. ip_tables.c, ip6_tables.c,
    and arp_tables.c).

    This causes data corruption in the ebtables userspace tools
    because they get incorrect packet & byte counts from the kernel.

    Fixes: d72133e628803 ("netfilter: ebtables: use ADD_COUNTER macro")
    Signed-off-by: Todd Seidelmann
    Signed-off-by: Pablo Neira Ayuso

    Todd Seidelmann
     
  • This patch adds initial support for offloading basechains using the
    priority range from 1 to 65535. This is restricting the netfilter
    priority range to 16-bit integer since this is what most drivers assume
    so far from tc. It should be possible to extend this range of supported
    priorities later on once drivers are updated to support for 32-bit
    integer priorities.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso