05 Aug, 2017

1 commit

  • Pull ceph fixes from Ilya Dryomov:
    "A bunch of fixes and follow-ups for -rc1 Luminous patches: issues with
    ->reencode_message() and last minute RADOS semantic changes in
    v12.1.2"

    * tag 'ceph-for-4.13-rc4' of git://github.com/ceph/ceph-client:
    libceph: make RECOVERY_DELETES feature create a new interval
    libceph: upmap semantic changes
    crush: assume weight_set != null imples weight_set_size > 0
    libceph: fallback for when there isn't a pool-specific choose_arg
    libceph: don't call ->reencode_message() more than once per message
    libceph: make encode_request_*() work with r_mempool requests

    Linus Torvalds
     

01 Aug, 2017

9 commits

  • This is needed so that the OSDs can regenerate the missing set at the
    start of a new interval where support for recovery deletes changed.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • - apply both pg_upmap and pg_upmap_items
    - allow bidirectional swap of pg-upmap-items

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Reflects ceph.git commit 5e8fa3e06b68fae1582c9230a3a8d1abc6146286.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • There is now a fallback to a choose_arg index of -1 if there isn't
    a pool-specific choose_arg set. If you create a per-pool weight-set,
    that works for that pool. Otherwise we try the compat/default one. If
    that doesn't exist either, then we use the normal CRUSH weights.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Reencoding an already reencoded message is a bad idea. This could
    happen on Policy::stateful_server connections (!CEPH_MSG_CONNECT_LOSSY),
    such as MDS sessions.

    This didn't pop up in testing because currently only OSD requests are
    reencoded and OSD sessions are always lossy.

    Fixes: 98ad5ebd1505 ("libceph: ceph_connection_operations::reencode_message() method")
    Signed-off-by: Ilya Dryomov
    Reviewed-by: "Yan, Zheng"

    Ilya Dryomov
     
  • Messages allocated out of ceph_msgpool have a fixed front length
    (pool->front_len). Asserting that the entire front has been filled
    while encoding is thus wrong.

    Fixes: 8cb441c0545d ("libceph: MOSDOp v8 encoding (actual spgid + full hash)")
    Reported-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov
    Reviewed-by: "Yan, Zheng"

    Ilya Dryomov
     
  • Pull networking fixes from David Miller:

    1) Handle notifier registry failures properly in tun/tap driver, from
    Tonghao Zhang.

    2) Fix bpf verifier handling of subtraction bounds and add a testcase
    for this, from Edward Cree.

    3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.

    4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
    from Cong Wang.

    5) Fix SElinux regression due to recent UDP optimizations, from Paolo
    Abeni.

    6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
    paths, fix from Stefano Brivio.

    7) Fix some mem leaks in dccp, from Xin Long.

    8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
    Bergmann.

    9) Mac address length check and buffer size fixes from Cong Wang.

    10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.

    11) Fix return value when copy_from_user() fails in
    bpf_prog_get_info_by_fd(), from Daniel Borkmann.

    12) Handle PHY_HALTED properly in phy library state machine, from
    Florian Fainelli.

    13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.

    14) Fix truesize calculation in virtio_net which led to performance
    regressions, from Michael S Tsirkin.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
    samples/bpf: fix bpf tunnel cleanup
    udp6: fix jumbogram reception
    ppp: Fix a scheduling-while-atomic bug in del_chan
    Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
    virtio_net: fix truesize for mergeable buffers
    mv643xx_eth: fix of_irq_to_resource() error check
    MAINTAINERS: Add more files to the PHY LIBRARY section
    ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
    net: phy: Correctly process PHY_HALTED in phy_stop_machine()
    sunhme: fix up GREG_STAT and GREG_IMASK register offsets
    bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
    tcp: avoid bogus gcc-7 array-bounds warning
    net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
    bpf: don't indicate success when copy_from_user fails
    udp6: fix socket leak on early demux
    net: thunderx: Fix BGX transmit stall due to underflow
    Revert "vhost: cache used event for better performance"
    team: use a larger struct for mac address
    net: check dev->addr_len for dev_set_mac_address()
    phy: bcm-ns-usb3: fix MDIO_BUS dependency
    ...

    Linus Torvalds
     
  • Since commit 67a51780aebb ("ipv6: udp: leverage scratch area
    helpers") udp6_recvmsg() read the skb len from the scratch area,
    to avoid a cache miss.
    But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
    length exceeds the 16 bits available in the scratch area. As a side
    effect the length returned by recvmsg() is:
    % (1<len if
    required, without a measurable overhead.

    Fixes: 67a51780aebb ("ipv6: udp: leverage scratch area helpers")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Michał reported a NULL pointer deref during fib_sync_down_dev() when
    unregistering a netdevice. The problem is that we don't check for
    'in_dev' being NULL, which can happen in very specific cases.

    Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or
    the inetaddr notification chains. However, if an interface isn't
    configured with any IP address, then it's possible for host routes to be
    flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in
    inetdev_destroy().

    To reproduce:
    $ ip link add type dummy
    $ ip route add local 1.1.1.0/24 dev dummy0
    $ ip link del dev dummy0

    Fix this by checking for the presence of 'in_dev' before referencing it.

    Fixes: 982acb97560c ("ipv4: fib: Notify about nexthop status changes")
    Signed-off-by: Ido Schimmel
    Reported-by: Michał Mirosław
    Tested-by: Michał Mirosław
    Signed-off-by: David S. Miller

    Ido Schimmel
     

30 Jul, 2017

3 commits

  • When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a
    false-positive warning:

    net/ipv4/tcp_output.c: In function 'tcp_connect':
    net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
    tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
    ^~
    net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
    tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~

    I have opened a gcc bug for this, but distros have already shipped
    compilers with this problem, and it's not clear yet whether there is
    a way for gcc to avoid the warning. As the problem is related to the
    bitfield access, this introduces a temporary variable to store the old
    enum value.

    I did not notice this warning earlier, since UBSAN is disabled when
    building with COMPILE_TEST, and that was always turned on in both
    allmodconfig and randconfig tests.

    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
    sk reference is retrieved and used, but the relevant reference
    count is leaked and the socket destructor is never called.
    Beyond leaking the sk memory, if there are pending UDP packets
    in the receive queue, even the related accounted memory is leaked.

    In the long run, this will cause persistent forward allocation errors
    and no UDP skbs (both ipv4 and ipv6) will be able to reach the
    user-space.

    Fix this by explicitly accessing the early demux reference before
    the lookup, and properly decreasing the socket reference count
    after usage.

    Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
    the now obsoleted comment about "socket cache".

    The newly added code is derived from the current ipv4 code for the
    similar path.

    v1 -> v2:
    fixed the __udp6_lib_rcv() return code for resubmission,
    as suggested by Eric

    Reported-by: Sam Edwards
    Reported-by: Marc Haber
    Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
    Signed-off-by: Paolo Abeni
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Historically, dev_ifsioc() uses struct sockaddr as mac
    address definition, this is why dev_set_mac_address()
    accepts a struct sockaddr pointer as input but now we
    have various types of mac addresse whose lengths
    are up to MAX_ADDR_LEN, longer than struct sockaddr,
    and saved in dev->addr_len.

    It is too late to fix dev_ifsioc() due to API
    compatibility, so just reject those larger than
    sizeof(struct sockaddr), otherwise we would read
    and use some random bytes from kernel stack.

    Fortunately, only a few IPv6 tunnel devices have addr_len
    larger than sizeof(struct sockaddr) and they don't support
    ndo_set_mac_addr(). But with team driver, in lb mode, they
    can still be enslaved to a team master and make its mac addr
    length as the same.

    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

27 Jul, 2017

5 commits

  • In dccp_feat_init, when ccid_get_builtin_ccids failsto alloc
    memory for rx.val, it should free tx.val before returning an
    error.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • The patch "dccp: fix a memleak that dccp_ipv6 doesn't put reqsk
    properly" fixed reqsk refcnt leak for dccp_ipv6. The same issue
    exists on dccp_ipv4.

    This patch is to fix it for dccp_ipv4.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • In dccp_v6_conn_request, after reqsk gets alloced and hashed into
    ehash table, reqsk's refcnt is set 3. one is for req->rsk_timer,
    one is for hlist, and the other one is for current using.

    The problem is when dccp_v6_conn_request returns and finishes using
    reqsk, it doesn't put reqsk. This will cause reqsk refcnt leaks and
    reqsk obj never gets freed.

    Jianlin found this issue when running dccp_memleak.c in a loop, the
    system memory would run out.

    dccp_memleak.c:
    int s1 = socket(PF_INET6, 6, IPPROTO_IP);
    bind(s1, &sa1, 0x20);
    listen(s1, 0x9);
    int s2 = socket(PF_INET6, 6, IPPROTO_IP);
    connect(s2, &sa1, 0x20);
    close(s1);
    close(s2);

    This patch is to put the reqsk before dccp_v6_conn_request returns,
    just as what tcp_conn_request does.

    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • Apparently netpoll_setup() assumes that netpoll.dev_name is a pointer
    when checking if the device name is set:

    if (np->dev_name) {
    ...

    However the field is a character array, therefore the condition always
    yields true. Check instead whether the first byte of the array has a
    non-zero value.

    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: David S. Miller

    Matthias Kaehlcke
     
  • We must use pre-processor conditional block or suitable accessors to
    manipulate skb->sp elsewhere builds lacking the CONFIG_XFRM will break.

    Fixes: dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

26 Jul, 2017

2 commits

  • RFC 2465 defines ipv6IfStatsOutFragFails as:

    "The number of IPv6 datagrams that have been discarded
    because they needed to be fragmented at this output
    interface but could not be."

    The existing implementation, instead, would increase the counter
    twice in case we fail to allocate room for single fragments:
    once for the fragment, once for the datagram.

    This didn't look intentional though. In one of the two affected
    affected failure paths, the double increase was simply a result
    of a new 'goto fail' statement, introduced to avoid a skb leak.
    The other path appears to be affected since at least 2.6.12-rc2.

    Reported-by: Sabrina Dubroca
    Fixes: 1d325d217c7f ("ipv6: ip6_fragment: fix headroom tests and skb leak")
    Signed-off-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Stefano Brivio
     
  • Paul Moore reported a SELinux/IP_PASSSEC regression
    caused by missing skb->sp at recvmsg() time. We need to
    preserve the skb head state to process the IP_CMSG_PASSSEC
    cmsg.

    With this commit we avoid releasing the skb head state in the
    BH even if a secpath is attached to the current skb, and stores
    the skb status (with/without head states) in the scratch area,
    so that we can access it at skb deallocation time, without
    incurring in cache-miss penalties.

    This also avoids misusing the skb CB for ipv6 packets,
    as introduced by the commit 0ddf3fb2c43d ("udp: preserve
    skb->dst if required for IP options processing").

    Clean a bit the scratch area helpers implementation, to
    reduce the code differences between 32 and 64 bits build.

    Reported-by: Paul Moore
    Fixes: 0a463c78d25b ("udp: avoid a cache miss on dequeue")
    Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing")
    Signed-off-by: Paolo Abeni
    Tested-by: Paul Moore
    Signed-off-by: David S. Miller

    Paolo Abeni
     

25 Jul, 2017

4 commits

  • The mt7530 driver has its dsa_switch_ops::get_tag_protocol function
    check ds->cpu_port_mask to issue a warning in case the configured CPU
    port is not capable of supporting tags.

    After commit 14be36c2c96c ("net: dsa: Initialize all CPU and enabled
    ports masks in dsa_ds_parse()") we slightly re-arranged the
    initialization such that this was no longer working. Just make sure that
    ds->cpu_port_mask is set prior to the first call to get_tag_protocol,
    thus restoring the expected contract. In case of error, the CPU port bit
    is cleared.

    Fixes: 14be36c2c96c ("net: dsa: Initialize all CPU and enabled ports masks in dsa_ds_parse()")
    Reported-by: Sean Wang
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • There are multiple reports showing we have a use-after-free in
    the timer prb_retire_rx_blk_timer_expired(), where we use struct
    tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
    free_pg_vec().

    The interesting part is it is not freed via packet_release() but
    via packet_setsockopt(), which means we are not closing the socket.
    Looking into the big and fat function packet_set_ring(), this could
    happen if we satisfy the following conditions:

    1. closing == 0, not on packet_release() path
    2. req->tp_block_nr == 0, we don't allocate a new pg_vec
    3. rx_ring->pg_vec is already set as V3, which means we already called
    packet_set_ring() wtih req->tp_block_nr > 0 previously
    4. req->tp_frame_nr == 0, pass sanity check
    5. po->mapped == 0, never called mmap()

    In this scenario we are clearing the old rx_ring->pg_vec, so we need
    to free this pg_vec, but we don't stop the timer on this path because
    of closing==0.

    The timer has to be stopped as long as we need to free pg_vec, therefore
    the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.

    Thanks to liujian for testing different fixes.

    Reported-by: alexander.levin@verizon.com
    Reported-by: Dave Jones
    Reported-by: liujian (CE)
    Tested-by: liujian (CE)
    Cc: Ding Tianhong
    Cc: Willem de Bruijn
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Before the 'type' is validated, we shouldn't use it to fetch the
    ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
    may happen.

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    Signed-off-by: Liping Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Liping Zhang
     
  • The commit ffb07550c76f ("copy_msghdr_from_user(): get rid of
    field-by-field copyin") introduce a new sparse warning:

    net/socket.c:1919:27: warning: incorrect type in assignment (different address spaces)
    net/socket.c:1919:27: expected void *msg_control
    net/socket.c:1919:27: got void [noderef] *[addressable] msg_control

    and a line above 80 chars, let's fix them

    Fixes: ffb07550c76f ("copy_msghdr_from_user(): get rid of field-by-field copyin")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

22 Jul, 2017

1 commit

  • Pull NFS client bugfixes from Anna Schumaker:
    "Stable bugfix:
    - Fix error reporting regression

    Bugfixes:
    - Fix setting filelayout ds address race
    - Fix subtle access bug when using ACLs
    - Fix setting mnt3_counts array size
    - Fix a couple of pNFS commit races"

    * tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
    NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid()
    NFS: Be more careful about mapping file permissions
    NFS: Store the raw NFS access mask in the inode's access cache
    NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask()
    NFS: Refactor NFS access to kernel access mask calculation
    net/sunrpc/xprt_sock: fix regression in connection error reporting.
    nfs: count correct array for mnt3_counts array size
    Revert commit 722f0b891198 ("pNFS: Don't send COMMITs to the DSes if...")
    pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()
    NFS: Fix another COMMIT race in pNFS
    NFS: Fix a COMMIT race in pNFS
    mount: copy the port field into the cloned nfs_server structure.
    NFS: Don't run wake_up_bit() when nobody is waiting...
    nfs: add export operations

    Linus Torvalds
     

21 Jul, 2017

5 commits

  • Commit 3d4762639dd3 ("tcp: remove poll() flakes when receiving
    RST") in v4.12 changed the order in which ->sk_state_change()
    and ->sk_error_report() are called when a socket is shut
    down - sk_state_change() is now called first.

    This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
    xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
    When the ->sk_error_report() callback arrives, it is too late to
    pass the error on, and it is lost.

    As easy way to demonstrate the problem caused is to try to start
    rpc.nfsd while rcpbind isn't running.
    nfsd will attempt a tcp connection to rpcbind. A ECONNREFUSED
    error is returned, but sunrpc code loses the error and keeps
    retrying. If it saw the ECONNREFUSED, it would abort.

    To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
    xs_tcp_state_change().

    Fixes: 3d4762639dd3 ("tcp: remove poll() flakes when receiving RST")
    Cc: stable@vger.kernel.org (v4.12)
    Signed-off-by: NeilBrown
    Signed-off-by: Anna Schumaker

    NeilBrown
     
  • Pull networking fixes from David Miller:

    1) BPF verifier signed/unsigned value tracking fix, from Daniel
    Borkmann, Edward Cree, and Josef Bacik.

    2) Fix memory allocation length when setting up calls to
    ->ndo_set_mac_address, from Cong Wang.

    3) Add a new cxgb4 device ID, from Ganesh Goudar.

    4) Fix FIB refcount handling, we have to set it's initial value before
    the configure callback (which can bump it). From David Ahern.

    5) Fix double-free in qcom/emac driver, from Timur Tabi.

    6) A bunch of gcc-7 string format overflow warning fixes from Arnd
    Bergmann.

    7) Fix link level headroom tests in ip_do_fragment(), from Vasily
    Averin.

    8) Fix chunk walking in SCTP when iterating over error and parameter
    headers. From Alexander Potapenko.

    9) TCP BBR congestion control fixes from Neal Cardwell.

    10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger.

    11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong
    Wang.

    12) xmit_recursion in ppp driver needs to be per-device not per-cpu,
    from Gao Feng.

    13) Cannot release skb->dst in UDP if IP options processing needs it.
    From Paolo Abeni.

    14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander
    Levin and myself.

    15) Revert some rtnetlink notification changes that are causing
    regressions, from David Ahern.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
    net: bonding: Fix transmit load balancing in balance-alb mode
    rds: Make sure updates to cp_send_gen can be observed
    net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
    ipv4: initialize fib_trie prior to register_netdev_notifier call.
    rtnetlink: allocate more memory for dev_set_mac_address()
    net: dsa: b53: Add missing ARL entries for BCM53125
    bpf: more tests for mixed signed and unsigned bounds checks
    bpf: add test for mixed signed and unsigned bounds checks
    bpf: fix up test cases with mixed signed/unsigned bounds
    bpf: allow to specify log level and reduce it for test_verifier
    bpf: fix mixed signed/unsigned derived min/max value bounds
    ipv6: avoid overflow of offset in ip6_find_1stfragopt
    net: tehuti: don't process data if it has not been copied from userspace
    Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"
    net: dsa: mv88e6xxx: Enable CMODE config support for 6390X
    dt-binding: ptp: Add SoC compatibility strings for dte ptp clock
    NET: dwmac: Make dwmac reset unconditional
    net: Zero terminate ifr_name in dev_ifname().
    wireless: wext: terminate ifr name coming from userspace
    netfilter: fix netfilter_net_init() return
    ...

    Linus Torvalds
     
  • cp->cp_send_gen is treated as a normal variable, although it may be
    used by different threads.

    This is fixed by using {READ,WRITE}_ONCE when it is incremented and
    READ_ONCE when it is read outside the {acquire,release}_in_xmit
    protection.

    Normative reference from the Linux-Kernel Memory Model:

    Loads from and stores to shared (but non-atomic) variables should
    be protected with the READ_ONCE(), WRITE_ONCE(), and
    ACCESS_ONCE().

    Clause 5.1.2.4/25 in the C standard is also relevant.

    Signed-off-by: Håkon Bugge
    Reviewed-by: Knut Omang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     
  • Net stack initialization currently initializes fib-trie after the
    first call to netdevice_notifier() call. In fact fib_trie initialization
    needs to happen before first rtnl_register(). It does not cause any problem
    since there are no devices UP at this moment, but trying to bring 'lo'
    UP at initialization would make this assumption wrong and exposes the issue.

    Fixes following crash

    Call Trace:
    ? alternate_node_alloc+0x76/0xa0
    fib_table_insert+0x1b7/0x4b0
    fib_magic.isra.17+0xea/0x120
    fib_add_ifaddr+0x7b/0x190
    fib_netdev_event+0xc0/0x130
    register_netdevice_notifier+0x1c1/0x1d0
    ip_fib_init+0x72/0x85
    ip_rt_init+0x187/0x1e9
    ip_init+0xe/0x1a
    inet_init+0x171/0x26c
    ? ipv4_offload_init+0x66/0x66
    do_one_initcall+0x43/0x160
    kernel_init_freeable+0x191/0x219
    ? rest_init+0x80/0x80
    kernel_init+0xe/0x150
    ret_from_fork+0x22/0x30
    Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
    RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28
    CR2: 0000000000000014

    Fixes: 7b1a74fdbb9e ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
    Fixes: 7f9b80529b8a ("[IPV4]: fib hash|trie initialization")

    Signed-off-by: Mahesh Bandewar
    Acked-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     
  • virtnet_set_mac_address() interprets mac address as struct
    sockaddr, but upper layer only allocates dev->addr_len
    which is ETH_ALEN + sizeof(sa_family_t) in this case.

    We lack a unified definition for mac address, so just fix
    the upper layer, this also allows drivers to interpret it
    to struct sockaddr freely.

    Reported-by: David Ahern
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

20 Jul, 2017

4 commits


19 Jul, 2017

4 commits

  • We accidentally return an uninitialized variable.

    Fixes: cf56c2f892a8 ("netfilter: remove old pre-netns era hook api")
    Signed-off-by: Dan Carpenter
    Acked-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for your net tree,
    they are:

    1) Missing netlink message sanity check in nfnetlink, patch from
    Mateusz Jurczyk.

    2) We now have netfilter per-netns hooks, so let's kill global hook
    infrastructure, this infrastructure is known to be racy with netns.
    We don't care about out of tree modules. Patch from Florian Westphal.

    3) find_appropriate_src() is buggy when colissions happens after the
    conversion of the nat bysource to rhashtable. Also from Florian.

    4) Remove forward chain in nf_tables arp family, it's useless and it is
    causing quite a bit of confusion, from Florian Westphal.

    5) nf_ct_remove_expect() is called with the wrong parameter, causing
    kernel oops, patch from Florian Westphal.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Eric noticed that in udp_recvmsg() we still need to access
    skb->dst while processing the IP options.
    Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue")
    skb->dst is no more available at recvmsg() time and bad things
    will happen if we enter the relevant code path.

    This commit address the issue, avoid clearing skb->dst if
    any IP options are present into the relevant skb.
    Since the IP CB is contained in the first skb cacheline, we can
    test it to decide to leverage the consume_stateless_skb()
    optimization, without measurable additional cost in the faster
    path.

    v1 -> v2: updated commit message tags

    Fixes: 0a463c78d25b ("udp: avoid a cache miss on dequeue")
    Reported-by: Andrey Konovalov
    Reported-by: Eric Dumazet
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(),
    which originated from the TCP request socket created in
    cookie_v6_check():

    ==================================================================
    BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0
    CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters.
    Call Trace:

    __dump_stack lib/dump_stack.c:16
    dump_stack+0x172/0x1c0 lib/dump_stack.c:52
    kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
    __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
    skb_set_hash_from_sk ./include/net/sock.h:2011
    tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983
    tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493
    tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284
    tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309
    call_timer_fn+0x240/0x520 kernel/time/timer.c:1268
    expire_timers kernel/time/timer.c:1307
    __run_timers+0xc13/0xf10 kernel/time/timer.c:1601
    run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614
    __do_softirq+0x485/0x942 kernel/softirq.c:284
    invoke_softirq kernel/softirq.c:364
    irq_exit+0x1fa/0x230 kernel/softirq.c:405
    exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657
    smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
    apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
    RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36
    RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77
    RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
    RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
    RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005
    RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770
    RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004
    R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810
    R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4

    poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293
    SYSC_select+0x4b4/0x4e0 fs/select.c:653
    SyS_select+0x76/0xa0 fs/select.c:634
    entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
    RIP: 0033:0x4597e7
    RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059
    R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20
    R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003
    chained origin:
    save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
    kmsan_save_stack mm/kmsan/kmsan.c:317
    kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547
    __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259
    tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472
    tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103
    tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212
    cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245
    tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
    tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
    tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
    ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
    NF_HOOK ./include/linux/netfilter.h:257
    ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
    dst_input ./include/net/dst.h:492
    ip6_rcv_finish net/ipv6/ip6_input.c:69
    NF_HOOK ./include/linux/netfilter.h:257
    ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
    __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
    __netif_receive_skb net/core/dev.c:4246
    process_backlog+0x667/0xba0 net/core/dev.c:4866
    napi_poll net/core/dev.c:5268
    net_rx_action+0xc95/0x1590 net/core/dev.c:5333
    __do_softirq+0x485/0x942 kernel/softirq.c:284
    origin:
    save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
    kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
    kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
    kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766
    reqsk_alloc ./include/net/request_sock.h:87
    inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200
    cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169
    tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
    tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
    tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
    ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
    NF_HOOK ./include/linux/netfilter.h:257
    ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
    dst_input ./include/net/dst.h:492
    ip6_rcv_finish net/ipv6/ip6_input.c:69
    NF_HOOK ./include/linux/netfilter.h:257
    ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
    __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
    __netif_receive_skb net/core/dev.c:4246
    process_backlog+0x667/0xba0 net/core/dev.c:4866
    napi_poll net/core/dev.c:5268
    net_rx_action+0xc95/0x1590 net/core/dev.c:5333
    __do_softirq+0x485/0x942 kernel/softirq.c:284
    ==================================================================

    Similar error is reported for cookie_v4_check().

    Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets")
    Signed-off-by: Alexander Potapenko
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Potapenko
     

17 Jul, 2017

2 commits

  • We crash in __nf_ct_expect_check, it calls nf_ct_remove_expect on the
    uninitialised expectation instead of existing one, so del_timer chokes
    on random memory address.

    Fixes: ec0e3f01114ad32711243 ("netfilter: nf_ct_expect: Add nf_ct_remove_expect()")
    Reported-by: Sergey Kvachonok
    Tested-by: Sergey Kvachonok
    Cc: Gao Feng
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • arp packets cannot be forwarded.

    They can be bridged, but then they can be filtered using
    either ebtables or nftables bridge family.

    The bridge netfilter exposes a "call-arptables" switch which
    pushes packets into arptables, but lets not expose this for nftables, so better
    close this asap.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal