07 Mar, 2017

3 commits

  • osd_request_timeout specifies how many seconds to wait for a response
    from OSDs before returning -ETIMEDOUT from an OSD request. 0 (default)
    means no limit.

    osd_request_timeout is osdkeepalive-precise -- in-flight requests are
    swept through every osdkeepalive seconds. With ack vs commit behaviour
    gone, abort_request() is really simple.

    This is based on a patch from Artur Molchanov .

    Tested-by: Artur Molchanov
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
    osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
    This changes the result of applying an incremental for clients, not
    just OSDs. Because CRUSH computations are obviously affected,
    pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
    object placement, resulting in misdirected requests.

    Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.

    Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals")
    Link: http://tracker.ceph.com/issues/19122
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Older (shorter) CRUSH maps too need to be finalized.

    Fixes: 66a0e2d579db ("crush: remove mutable part of CRUSH map")
    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

05 Mar, 2017

1 commit

  • Pull networking fixes from David Miller:

    1) Fix double-free in batman-adv, from Sven Eckelmann.

    2) Fix packet stats for fast-RX path, from Joannes Berg.

    3) Netfilter's ip_route_me_harder() doesn't handle request sockets
    properly, fix from Florian Westphal.

    4) Fix sendmsg deadlock in rxrpc, from David Howells.

    5) Add missing RCU locking to transport hashtable scan, from Xin Long.

    6) Fix potential packet loss in mlxsw driver, from Ido Schimmel.

    7) Fix race in NAPI handling between poll handlers and busy polling,
    from Eric Dumazet.

    8) TX path in vxlan and geneve need proper RCU locking, from Jakub
    Kicinski.

    9) SYN processing in DCCP and TCP need to disable BH, from Eric
    Dumazet.

    10) Properly handle net_enable_timestamp() being invoked from IRQ
    context, also from Eric Dumazet.

    11) Fix crash on device-tree systems in xgene driver, from Alban Bedel.

    12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de
    Melo.

    13) Fix use-after-free in netvsc driver, from Dexuan Cui.

    14) Fix max MTU setting in bonding driver, from WANG Cong.

    15) xen-netback hash table can be allocated from softirq context, so use
    GFP_ATOMIC. From Anoob Soman.

    16) Fix MAC address change bug in bgmac driver, from Hari Vyas.

    17) strparser needs to destroy strp_wq on module exit, from WANG Cong.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
    strparser: destroy workqueue on module exit
    sfc: fix IPID endianness in TSOv2
    sfc: avoid max() in array size
    rds: remove unnecessary returned value check
    rxrpc: Fix potential NULL-pointer exception
    nfp: correct DMA direction in XDP DMA sync
    nfp: don't tell FW about the reserved buffer space
    net: ethernet: bgmac: mac address change bug
    net: ethernet: bgmac: init sequence bug
    xen-netback: don't vfree() queues under spinlock
    xen-netback: keep a local pointer for vif in backend_disconnect()
    netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails
    netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups
    netfilter: nf_conntrack_sip: fix wrong memory initialisation
    can: flexcan: fix typo in comment
    can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
    can: gs_usb: fix coding style
    can: gs_usb: Don't use stack memory for USB transfers
    ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines
    ixgbe: update the rss key on h/w, when ethtool ask for it
    ...

    Linus Torvalds
     

04 Mar, 2017

6 commits

  • Pull misc final vfs updates from Al Viro:
    "A few unrelated patches that got beating in -next.

    Everything else will have to go into the next window ;-/"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    hfs: fix hfs_readdir()
    selftest for default_file_splice_read() infoleak
    9p: constify ->d_name handling

    Linus Torvalds
     
  • Fixes: 43a0c6751a32 ("strparser: Stream parser for messages")
    Cc: Tom Herbert
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for your net tree,
    they are:

    1) Missing check for full sock in ip_route_me_harder(), from
    Florian Westphal.

    2) Incorrect sip helper structure initilization that breaks it when
    several ports are used, from Christophe Leroy.

    3) Fix incorrect assumption when looking up for matching with adjacent
    intervals in the nft_set_rbtree.

    4) Fix broken netlink event error reporting in nf_tables that results
    in misleading ESRCH errors propagated to userspace listeners.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull sched.h split-up from Ingo Molnar:
    "The point of these changes is to significantly reduce the
    header footprint, to speed up the kernel build and to
    have a cleaner header structure.

    After these changes the new 's typical preprocessed
    size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
    lines), which is around 40% faster to build on typical configs.

    Not much changed from the last version (-v2) posted three weeks ago: I
    eliminated quirks, backmerged fixes plus I rebased it to an upstream
    SHA1 from yesterday that includes most changes queued up in -next plus
    all sched.h changes that were pending from Andrew.

    I've re-tested the series both on x86 and on cross-arch defconfigs,
    and did a bisectability test at a number of random points.

    I tried to test as many build configurations as possible, but some
    build breakage is probably still left - but it should be mostly
    limited to architectures that have no cross-compiler binaries
    available on kernel.org, and non-default configurations"

    * 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
    sched/headers: Clean up
    sched/headers: Remove #ifdefs from
    sched/headers: Remove the include from
    sched/headers, hrtimer: Remove the include from
    sched/headers, x86/apic: Remove the header inclusion from
    sched/headers, timers: Remove the include from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/core: Remove unused prefetch_stack()
    sched/headers: Remove from
    sched/headers: Remove the 'init_pid_ns' prototype from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove the runqueue_is_locked() prototype
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove the include from
    sched/headers: Remove from
    ...

    Linus Torvalds
     
  • The function rds_trans_register always returns 0. As such, it is not
    necessary to check the returned value.

    Cc: Joe Jin
    Cc: Junxiao Bi
    Signed-off-by: Zhu Yanjun
    Reviewed-by: Yuval Shaia
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Zhu Yanjun
     
  • Fix a potential NULL-pointer exception in rxrpc_do_sendmsg(). The call
    state check that I added should have gone into the else-body of the
    if-statement where we actually have a call to check.

    Found by CoverityScan CID#1414316 ("Dereference after null check").

    Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
    Reported-by: Colin Ian King
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

03 Mar, 2017

15 commits

  • The underlying nlmsg_multicast() already sets sk->sk_err for us to
    notify socket overruns, so we should not do anything with this return
    value. So we just call nfnetlink_set_err() if:

    1) We fail to allocate the netlink message.

    or

    2) We don't have enough space in the netlink message to place attributes,
    which means that we likely need to allocate a larger message.

    Before this patch, the internal ESRCH netlink error code was propagated
    to userspace, which is quite misleading. Netlink semantics mandate that
    listeners just hit ENOBUFS if the socket buffer overruns.

    Reported-by: Alexander Alemayhu
    Tested-by: Alexander Alemayhu
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • In case of adjacent ranges, we may indeed see either the high part of
    the range in first place or the low part of it. Remove this incorrect
    assumption, let's make sure we annotate the low part of the interval in
    case of we have adjacent interva intervals so we hit a matching in
    lookups.

    Reported-by: Simon Hanisch
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • In commit 82de0be6862cd ("netfilter: Add helper array
    register/unregister functions"),
    struct nf_conntrack_helper sip[MAX_PORTS][4] was changed to
    sip[MAX_PORTS * 4], so the memory init should have been changed to
    memset(&sip[4 * i], 0, 4 * sizeof(sip[i]));

    But as the sip[] table is allocated in the BSS, it is already set to 0

    Fixes: 82de0be6862cd ("netfilter: Add helper array register/unregister functions")
    Signed-off-by: Christophe Leroy
    Signed-off-by: Pablo Neira Ayuso

    Christophe Leroy
     
  • …sors into <linux/sched/signal.h>

    task_struct::signal and task_struct::sighand are pointers, which would normally make it
    straightforward to not define those types in sched.h.

    That is not so, because the types are accompanied by a myriad of APIs (macros and inline
    functions) that dereference them.

    Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.

    With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
    trying to put accessors into sched.h as a test fails the following way:

    ./include/linux/sched.h: In function ‘test_signal_types’:
    ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
    ^

    This reduces the size and complexity of sched.h significantly.

    Update all headers and .c code that relied on getting the signal handling
    functionality from <linux/sched.h> to include <linux/sched/signal.h>.

    The list of affected files in the preparatory patch was partly generated by
    grepping for the APIs, and partly by doing coverage build testing, both
    all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
    cross-architecture builds.

    Nevertheless some (trivial) build breakage is still expected related to rare
    Kconfig combinations and in-flight patches to various kernel code, but most
    of it should be handled by this patch.

    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Pull vfs sendmsg updates from Al Viro:
    "More sendmsg work.

    This is a fairly separate isolated stuff (there's a continuation
    around lustre, but that one was too late to soak in -next), thus the
    separate pull request"

    * 'work.sendmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    ncpfs: switch to sock_sendmsg()
    ncpfs: don't mess with manually advancing iovec on send
    ncpfs: sendmsg does *not* bugger iovec these days
    ceph_tcp_sendpage(): use ITER_BVEC sendmsg
    afs_send_pages(): use ITER_BVEC
    rds: remove dead code
    ceph: switch to sock_recvmsg()
    usbip_recv(): switch to sock_recvmsg()
    iscsi_target: deal with short writes on the tx side
    [nbd] pass iov_iter to nbd_xmit()
    [nbd] switch sock_xmit() to sock_{send,recv}msg()
    [drbd] use sock_sendmsg()

    Linus Torvalds
     
  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    This contains just the average.h change in order to get it
    into the tree before adding new users through -next trees.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Like commit 1f17e2f2c8a8 ("net: ipv6: ignore null_entry on route dumps"),
    we need to ignore null entry in inet6_rtm_getroute() too.

    Return -ENETUNREACH here to sync with IPv4 behavior, as suggested by David.

    Fixes: a1a22c1206 ("net: ipv6: Keep nexthop of multipath route on admin down")
    Reported-by: Dmitry Vyukov
    Cc: David Ahern
    Signed-off-by: Cong Wang
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    WANG Cong
     
  • tp->fastopen_req could potentially be double freed if a malicious
    user does the following:
    1. Enable TCP_FASTOPEN_CONNECT sockopt and do a connect() on the socket.
    2. Call connect() with AF_UNSPEC to disconnect the socket.
    3. Make this socket a listening socket by calling listen().
    4. Accept incoming connections and generate child sockets. All child
    sockets will get a copy of the pointer of fastopen_req.
    5. Call close() on all sockets. fastopen_req will get freed multiple
    times.

    Fixes: 19f6d3f3c842 ("net/tcp-fastopen: Add new API support")
    Reported-by: Andrey Konovalov
    Signed-off-by: Wei Wang
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Wei Wang
     
  • Pull vhost updates from Michael Tsirkin:
    "virtio, vhost: optimizations, fixes

    Looks like a quiet cycle for vhost/virtio, just a couple of minor
    tweaks. Most notable is automatic interrupt affinity for blk and scsi.
    Hopefully other devices are not far behind"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio-console: avoid DMA from stack
    vhost: introduce O(1) vq metadata cache
    virtio_scsi: use virtio IRQ affinity
    virtio_blk: use virtio IRQ affinity
    blk-mq: provide a default queue mapping for virtio device
    virtio: provide a method to get the IRQ affinity mask for a virtqueue
    virtio: allow drivers to request IRQ affinity when creating VQs
    virtio_pci: simplify MSI-X setup
    virtio_pci: don't duplicate the msix_enable flag in struct pci_dev
    virtio_pci: use shared interrupts for virtqueues
    virtio_pci: remove struct virtio_pci_vq_info
    vhost: try avoiding avail index access when getting descriptor
    virtio_mmio: expose header to userspace

    Linus Torvalds
     
  • Pull security subsystem fixes from James Morris:
    "Two fixes for the security subsystem:

    - keys: split both rcu_dereference_key() and user_key_payload() into
    versions which can be called with or without holding the key
    semaphore.

    - SELinux: fix Android init(8) breakage due to new cgroup security
    labeling support when using older policy"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    selinux: wrap cgroup seclabel support with its own policy capability
    KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()

    Linus Torvalds
     
  • When handling problems in cloning a socket with the sk_clone_locked()
    function we need to perform several steps that were open coded in it and
    its callers, so introduce a routine to avoid this duplication:
    sk_free_unlock_clone().

    Cc: Cong Wang
    Cc: Dmitry Vyukov
    Cc: Eric Dumazet
    Cc: Gerrit Renker
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/n/net-ui6laqkotycunhtmqryl9bfx@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • The code where sk_clone() came from created a new socket and locked it,
    but then, on the error path didn't unlock it.

    This problem stayed there for a long while, till b0691c8ee7c2 ("net:
    Unlock sock before calling sk_free()") fixed it, but unfortunately the
    callers of sk_clone() (now sk_clone_locked()) were not audited and the
    one in dccp_create_openreq_child() remained.

    Now in the age of the syskaller fuzzer, this was finally uncovered, as
    reported by Dmitry:

    ---- 8< ----

    I've got the following report while running syzkaller fuzzer on
    86292b33d4b7 ("Merge branch 'akpm' (patches from Andrew)")

    [ BUG: held lock freed! ]
    4.10.0+ #234 Not tainted
    -------------------------
    syz-executor6/6898 is freeing memory
    ffff88006286cac0-ffff88006286d3b7, with a lock still held there!
    (slock-AF_INET6){+.-...}, at: [] spin_lock
    include/linux/spinlock.h:299 [inline]
    (slock-AF_INET6){+.-...}, at: []
    sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504
    5 locks held by syz-executor6/6898:
    #0: (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
    include/net/sock.h:1460 [inline]
    #0: (sk_lock-AF_INET6){+.+.+.}, at: []
    inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:681
    #1: (rcu_read_lock){......}, at: []
    inet6_csk_xmit+0x12a/0x5d0 net/ipv6/inet6_connection_sock.c:126
    #2: (rcu_read_lock){......}, at: [] __skb_unlink
    include/linux/skbuff.h:1767 [inline]
    #2: (rcu_read_lock){......}, at: [] __skb_dequeue
    include/linux/skbuff.h:1783 [inline]
    #2: (rcu_read_lock){......}, at: []
    process_backlog+0x264/0x730 net/core/dev.c:4835
    #3: (rcu_read_lock){......}, at: []
    ip6_input_finish+0x0/0x1700 net/ipv6/ip6_input.c:59
    #4: (slock-AF_INET6){+.-...}, at: [] spin_lock
    include/linux/spinlock.h:299 [inline]
    #4: (slock-AF_INET6){+.-...}, at: []
    sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504

    Fix it just like was done by b0691c8ee7c2 ("net: Unlock sock before calling
    sk_free()").

    Reported-by: Dmitry Vyukov
    Cc: Cong Wang
    Cc: Eric Dumazet
    Cc: Gerrit Renker
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170301153510.GE15145@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Simon Wunderlich says:

    ====================
    Here are two batman-adv bugfixes:

    - fix a potential double free when fragment merges fail,
    by Sven Eckelmann

    - fix failing tranmission of the 16th (last) fragment if that exists,
    by Linus Lüssing
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Fixed a brace coding style warning reported by checkpatch.pl

    Signed-off-by: Peter Downs
    Signed-off-by: David S. Miller

    Peter Downs
     
  • Andrey reported a NULL pointer deref bug in ipv6_route_ioctl()
    -> ip6_route_del() -> __ip6_del_rt_siblings() code path. This is
    because ip6_null_entry is returned in this path since ip6_null_entry
    is kinda default for a ipv6 route table root node. Quote from
    David Ahern:

    ip6_null_entry is the root of all ipv6 fib tables making it integrated
    into the table ...

    We should ignore any attempt of trying to delete it, like we do in
    __ip6_del_rt() path and several others.

    Reported-by: Andrey Konovalov
    Fixes: 0ae8133586ad ("net: ipv6: Allow shorthand delete of all nexthops in multipath route")
    Cc: David Ahern
    Cc: Eric Dumazet
    Signed-off-by: Cong Wang
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    WANG Cong
     

02 Mar, 2017

15 commits

  • But first update the code that uses these facilities with the
    new header.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We don't actually need the full rculist.h header in sched.h anymore,
    we will be able to include the smaller rcupdate.h header instead.

    But first update code that relied on the implicit header inclusion.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Update the .c files that depend on these APIs.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • …hed.h> into <linux/sched/signal.h>

    Fix up affected files that include this signal functionality via sched.h.

    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Add #include dependencies to all .c files rely on sched.h
    doing that for them.

    Note that even if the count where we need to add extra headers seems high,
    it's still a net win, because is included in over
    2,200 files ...

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Declaring the factor is counter-intuitive, and people are prone
    to using small(-ish) values even when that makes no sense.

    Change the DECLARE_EWMA() macro to take the fractional precision,
    in bits, rather than a factor, and update all users.

    While at it, add some more documentation.

    Acked-by: David S. Miller
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • Andrey reported a use-after-free in IPv6 stack.

    Issue here is that we free the socket while it still has skb
    in TX path and in some queues.

    It happens here because IPv6 reassembly unit messes skb->truesize,
    breaking skb_set_owner_w() badly.

    We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag:
    Always orphan skbs inside ip_defrag()")
    Acked-by: Joe Stringer

    ==================================================================
    BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
    Read of size 8 at addr ffff880062da0060 by task a.out/4140

    page:ffffea00018b6800 count:1 mapcount:0 mapping: (null)
    index:0x0 compound_mapcount: 0
    flags: 0x100000000008100(slab|head)
    raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
    raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
    page dumped because: kasan: bad access detected

    CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:15
    dump_stack+0x292/0x398 lib/dump_stack.c:51
    describe_address mm/kasan/report.c:262
    kasan_report_error+0x121/0x560 mm/kasan/report.c:370
    kasan_report mm/kasan/report.c:392
    __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
    sock_flag ./arch/x86/include/asm/bitops.h:324
    sock_wfree+0x118/0x120 net/core/sock.c:1631
    skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
    skb_release_all+0x15/0x60 net/core/skbuff.c:668
    __kfree_skb+0x15/0x20 net/core/skbuff.c:684
    kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
    inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
    inet_frag_put ./include/net/inet_frag.h:133
    nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
    ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
    nf_hook_entry_hookfn ./include/linux/netfilter.h:102
    nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
    nf_hook ./include/linux/netfilter.h:212
    __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
    ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
    ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
    ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
    rawv6_push_pending_frames net/ipv6/raw.c:613
    rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
    inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
    sock_sendmsg_nosec net/socket.c:635
    sock_sendmsg+0xca/0x110 net/socket.c:645
    sock_write_iter+0x326/0x620 net/socket.c:848
    new_sync_write fs/read_write.c:499
    __vfs_write+0x483/0x760 fs/read_write.c:512
    vfs_write+0x187/0x530 fs/read_write.c:560
    SYSC_write fs/read_write.c:607
    SyS_write+0xfb/0x230 fs/read_write.c:599
    entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
    RIP: 0033:0x7ff26e6f5b79
    RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
    RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
    RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
    R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003

    The buggy address belongs to the object at ffff880062da0000
    which belongs to the cache RAWv6 of size 1504
    The buggy address ffff880062da0060 is located 96 bytes inside
    of 1504-byte region [ffff880062da0000, ffff880062da05e0)

    Freed by task 4113:
    save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
    save_stack+0x43/0xd0 mm/kasan/kasan.c:502
    set_track mm/kasan/kasan.c:514
    kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
    slab_free_hook mm/slub.c:1352
    slab_free_freelist_hook mm/slub.c:1374
    slab_free mm/slub.c:2951
    kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
    sk_prot_free net/core/sock.c:1377
    __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
    sk_destruct+0x47/0x80 net/core/sock.c:1460
    __sk_free+0x57/0x230 net/core/sock.c:1468
    sk_free+0x23/0x30 net/core/sock.c:1479
    sock_put ./include/net/sock.h:1638
    sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
    rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
    inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
    inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
    sock_release+0x8d/0x1e0 net/socket.c:599
    sock_close+0x16/0x20 net/socket.c:1063
    __fput+0x332/0x7f0 fs/file_table.c:208
    ____fput+0x15/0x20 fs/file_table.c:244
    task_work_run+0x19b/0x270 kernel/task_work.c:116
    exit_task_work ./include/linux/task_work.h:21
    do_exit+0x186b/0x2800 kernel/exit.c:839
    do_group_exit+0x149/0x420 kernel/exit.c:943
    SYSC_exit_group kernel/exit.c:954
    SyS_exit_group+0x1d/0x20 kernel/exit.c:952
    entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203

    Allocated by task 4115:
    save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
    save_stack+0x43/0xd0 mm/kasan/kasan.c:502
    set_track mm/kasan/kasan.c:514
    kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
    slab_post_alloc_hook mm/slab.h:432
    slab_alloc_node mm/slub.c:2708
    slab_alloc mm/slub.c:2716
    kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
    sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
    sk_alloc+0x105/0x1010 net/core/sock.c:1396
    inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
    __sock_create+0x4f6/0x880 net/socket.c:1199
    sock_create net/socket.c:1239
    SYSC_socket net/socket.c:1269
    SyS_socket+0xf9/0x230 net/socket.c:1249
    entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203

    Memory state around the buggy address:
    ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    >ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ==================================================================

    Reported-by: Andrey Konovalov
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • It is now very clear that silly TCP listeners might play with
    enabling/disabling timestamping while new children are added
    to their accept queue.

    Meaning net_enable_timestamp() can be called from BH context
    while current state of the static key is not enabled.

    Lets play safe and allow all contexts.

    The work queue is scheduled only under the problematic cases,
    which are the static key enable/disable transition, to not slow down
    critical paths.

    This extends and improves what we did in commit 5fa8bbda38c6 ("net: use
    a work queue to defer net_disable_timestamp() work")

    Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context")
    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
    uninitialized memory in packet_bind_spkt():
    Acked-by: Eric Dumazet

    ==================================================================
    BUG: KMSAN: use of unitialized memory
    CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
    01/01/2011
    0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
    ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
    0000000000000000 0000000000000092 00000000ec400911 0000000000000002
    Call Trace:
    [< inline >] __dump_stack lib/dump_stack.c:15
    [] dump_stack+0x238/0x290 lib/dump_stack.c:51
    [] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
    [] __msan_warning+0x5b/0xb0
    mm/kmsan/kmsan_instr.c:424
    [< inline >] strlen lib/string.c:484
    [] strlcpy+0x9d/0x200 lib/string.c:144
    [] packet_bind_spkt+0x144/0x230
    net/packet/af_packet.c:3132
    [] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
    [] SyS_bind+0x82/0xa0 net/socket.c:1356
    [] entry_SYSCALL_64_fastpath+0x13/0x8f
    arch/x86/entry/entry_64.o:?
    chained origin: 00000000eba00911
    [] save_stack_trace+0x27/0x50
    arch/x86/kernel/stacktrace.c:67
    [< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
    [< inline >] kmsan_save_stack mm/kmsan/kmsan.c:334
    [] kmsan_internal_chain_origin+0x118/0x1e0
    mm/kmsan/kmsan.c:527
    [] __msan_set_alloca_origin4+0xc3/0x130
    mm/kmsan/kmsan_instr.c:380
    [] SYSC_bind+0x129/0x5f0 net/socket.c:1356
    [] SyS_bind+0x82/0xa0 net/socket.c:1356
    [] entry_SYSCALL_64_fastpath+0x13/0x8f
    arch/x86/entry/entry_64.o:?
    origin description: ----address@SYSC_bind (origin=00000000eb400911)
    ==================================================================
    (the line numbers are relative to 4.8-rc6, but the bug persists
    upstream)

    , when I run the following program as root:

    =====================================
    #include
    #include
    #include
    #include

    int main() {
    struct sockaddr addr;
    memset(&addr, 0xff, sizeof(addr));
    addr.sa_family = AF_PACKET;
    int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL));
    bind(fd, &addr, sizeof(addr));
    return 0;
    }
    =====================================

    This happens because addr.sa_data copied from the userspace is not
    zero-terminated, and copying it with strlcpy() in packet_bind_spkt()
    results in calling strlen() on the kernel copy of that non-terminated
    buffer.

    Signed-off-by: Alexander Potapenko
    Signed-off-by: David S. Miller

    Alexander Potapenko
     
  • Even with multicast flooding turned off, IPv6 ND should still work so
    that IPv6 connectivity is provided. Allow this by continuing to flood
    multicast traffic originated by us.

    Fixes: b6cb5ac8331b ("net: bridge: add per-port multicast flood flag")
    Cc: Nikolay Aleksandrov
    Signed-off-by: Mike Manning
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Mike Manning
     
  • Pull NFS client updates from Anna Schumaker:
    "Highlights include:

    Stable bugfixes:
    - NFSv4: Fix memory and state leak in _nfs4_open_and_get_state
    - xprtrdma: Fix Read chunk padding
    - xprtrdma: Per-connection pad optimization
    - xprtrdma: Disable pad optimization by default
    - xprtrdma: Reduce required number of send SGEs
    - nlm: Ensure callback code also checks that the files match
    - pNFS/flexfiles: If the layout is invalid, it must be updated before
    retrying
    - NFSv4: Fix reboot recovery in copy offload
    - Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION
    replies to OP_SEQUENCE"
    - NFSv4: fix getacl head length estimation
    - NFSv4: fix getacl ERANGE for sum ACL buffer sizes

    Features:
    - Add and use dprintk_cont macros
    - Various cleanups to NFS v4.x to reduce code duplication and
    complexity
    - Remove unused cr_magic related code
    - Improvements to sunrpc "read from buffer" code
    - Clean up sunrpc timeout code and allow changing TCP timeout
    parameters
    - Remove duplicate mw_list management code in xprtrdma
    - Add generic functions for encoding and decoding xdr streams

    Bugfixes:
    - Clean up nfs_show_mountd_netid
    - Make layoutreturn_ops static and use NULL instead of 0 to fix
    sparse warnings
    - Properly handle -ERESTARTSYS in nfs_rename()
    - Check if register_shrinker() failed during rpcauth_init()
    - Properly clean up procfs/pipefs entries
    - Various NFS over RDMA related fixes
    - Silence unititialized variable warning in sunrpc"

    * tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (64 commits)
    NFSv4: fix getacl ERANGE for some ACL buffer sizes
    NFSv4: fix getacl head length estimation
    Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE"
    NFSv4: Fix reboot recovery in copy offload
    pNFS/flexfiles: If the layout is invalid, it must be updated before retrying
    NFSv4: Clean up owner/group attribute decode
    SUNRPC: Add a helper function xdr_stream_decode_string_dup()
    NFSv4: Remove bogus "struct nfs_client" argument from decode_ace()
    NFSv4: Fix the underestimation of delegation XDR space reservation
    NFSv4: Replace callback string decode function with a generic
    NFSv4: Replace the open coded decode_opaque_inline() with the new generic
    NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* generics
    SUNRPC: Add generic helpers for xdr_stream encode/decode
    sunrpc: silence uninitialized variable warning
    nlm: Ensure callback code also checks that the files match
    sunrpc: Allow xprt->ops->timer method to sleep
    xprtrdma: Refactor management of mw_list field
    xprtrdma: Handle stale connection rejection
    xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs
    xprtrdma: Shrink send SGEs array
    ...

    Linus Torvalds