25 Jan, 2016

2 commits

  • Pull 9p updates from Eric Van Hensbergen:
    "Sorry for the last minute pull request, there's was a change that
    didn't get pulled into for-next until two weeks ago and I wanted to
    give it some bake time.

    Summary:

    Rework and error handling fixes, primarily in the fscatch and fd
    transports"

    * tag 'for-linus-4.5-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: use fscache mutex rather than spinlock
    9p: trans_fd, bail out if recv fcall if missing
    9p: trans_fd, read rework to use p9_parse_header
    net/9p: Add device name details on error

    Linus Torvalds
     
  • Pull Ceph updates from Sage Weil:
    "The two main changes are aio support in CephFS, and a series that
    fixes several issues in the authentication key timeout/renewal code.

    On top of that are a variety of cleanups and minor bug fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: remove outdated comment
    libceph: kill off ceph_x_ticket_handler::validity
    libceph: invalidate AUTH in addition to a service ticket
    libceph: fix authorizer invalidation, take 2
    libceph: clear messenger auth_retry flag if we fault
    libceph: fix ceph_msg_revoke()
    libceph: use list_for_each_entry_safe
    ceph: use i_size_{read,write} to get/set i_size
    ceph: re-send AIO write request when getting -EOLDSNAP error
    ceph: Asynchronous IO support
    ceph: Avoid to propagate the invalid page point
    ceph: fix double page_unlock() in page_mkwrite()
    rbd: delete an unnecessary check before rbd_dev_destroy()
    libceph: use list_next_entry instead of list_entry_next
    ceph: ceph_frag_contains_value can be boolean
    ceph: remove unused functions in ceph_frag.h

    Linus Torvalds
     

24 Jan, 2016

2 commits

  • Pull rdma updates from Doug Ledford:
    "Initial roundup of 4.5 merge window patches

    - Remove usage of ib_query_device and instead store attributes in
    ib_device struct

    - Move iopoll out of block and into lib, rename to irqpoll, and use
    in several places in the rdma stack as our new completion queue
    polling library mechanism. Update the other block drivers that
    already used iopoll to use the new mechanism too.

    - Replace the per-entry GID table locks with a single GID table lock

    - IPoIB multicast cleanup

    - Cleanups to the IB MR facility

    - Add support for 64bit extended IB counters

    - Fix for netlink oops while parsing RDMA nl messages

    - RoCEv2 support for the core IB code

    - mlx4 RoCEv2 support

    - mlx5 RoCEv2 support

    - Cross Channel support for mlx5

    - Timestamp support for mlx5

    - Atomic support for mlx5

    - Raw QP support for mlx5

    - MAINTAINERS update for mlx4/mlx5

    - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates

    - Add support for remote invalidate to the iSER driver (pushed
    through the RDMA tree due to dependencies, acknowledged by nab)

    - Update to NFSoRDMA (pushed through the RDMA tree due to
    dependencies, acknowledged by Bruce)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
    IB/mlx5: Unify CQ create flags check
    IB/mlx5: Expose Raw Packet QP to user space consumers
    {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
    IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
    IB/mlx5: Add Raw Packet QP query functionality
    IB/mlx5: Add create and destroy functionality for Raw Packet QP
    IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
    IB/mlx5: Allocate a Transport Domain for each ucontext
    net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
    net/mlx5_core: Add RQ and SQ event handling
    net/mlx5_core: Export transport objects
    IB/mlx5: Expose CQE version to user-space
    IB/mlx5: Add CQE version 1 support to user QPs and SRQs
    IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
    IB/sa: Fix netlink local service GFP crash
    IB/srpt: Remove redundant wc array
    IB/qib: Improve ipoib UD performance
    IB/mlx4: Advertise RoCE v2 support
    IB/mlx4: Create and use another QP1 for RoCEv2
    IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
    ...

    Linus Torvalds
     
  • Pull final vfs updates from Al Viro:

    - The ->i_mutex wrappers (with small prereq in lustre)

    - a fix for too early freeing of symlink bodies on shmem (they need to
    be RCU-delayed) (-stable fodder)

    - followup to dedupe stuff merged this cycle

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: abort dedupe loop if fatal signals are pending
    make sure that freeing shmem fast symlinks is RCU-delayed
    wrappers for ->i_mutex access
    lustre: remove unused declaration

    Linus Torvalds
     

23 Jan, 2016

2 commits

  • There are many locations that do

    if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
    else
    kfree(ptr);

    but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
    using is_vmalloc_addr(). Unless callers have special reasons, we can
    replace this branch with kvfree(). Please check and reply if you found
    problems.

    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Acked-by: Jan Kara
    Acked-by: Russell King
    Reviewed-by: Andreas Dilger
    Acked-by: "Rafael J. Wysocki"
    Acked-by: David Rientjes
    Cc: "Luck, Tony"
    Cc: Oleg Drokin
    Cc: Boris Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

22 Jan, 2016

8 commits

  • MClientMount{,Ack} are long gone. The receipt of bare monmap doesn't
    actually indicate a mount success as we are yet to authenticate at that
    point in time.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • With it gone, no need to preserve ceph_timespec in process_one_ticket()
    either.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • If we fault due to authentication, we invalidate the service ticket we
    have and request a new one - the idea being that if a service rejected
    our authorizer, it must have expired, despite mon_client's attempts at
    periodic renewal. (The other possibility is that our ticket is too new
    and the service hasn't gotten it yet, in which case invalidating isn't
    necessary but doesn't hurt.)

    Invalidating just the service ticket is not enough, though. If we
    assume a failure on mon_client's part to renew a service ticket, we
    have to assume the same for the AUTH ticket. If our AUTH ticket is
    bad, we won't get any service tickets no matter how hard we try, so
    invalidate AUTH ticket along with the service ticket.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Back in 2013, commit 4b8e8b5d78b8 ("libceph: fix authorizer
    invalidation") tried to fix authorizer invalidation issues by clearing
    validity field. However, nothing ever consults this field, so it
    doesn't force us to request any new secrets in any way and therefore we
    never get out of the exponential backoff mode:

    [ 129.973812] libceph: osd2 192.168.122.1:6810 connect authorization failure
    [ 130.706785] libceph: osd2 192.168.122.1:6810 connect authorization failure
    [ 131.710088] libceph: osd2 192.168.122.1:6810 connect authorization failure
    [ 133.708321] libceph: osd2 192.168.122.1:6810 connect authorization failure
    [ 137.706598] libceph: osd2 192.168.122.1:6810 connect authorization failure
    ...

    AFAICT this was the case at the time 4b8e8b5d78b8 was merged, too.

    Using timespec solely as a bool isn't nice, so introduce a new have_key
    flag, specifically for this purpose.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Commit 20e55c4cc758 ("libceph: clear messenger auth_retry flag when we
    authenticate") got us only half way there. We clear the flag if the
    second attempt succeeds, but it also needs to be cleared if that
    attempt fails, to allow for the exponential backoff to kick in.
    Otherwise, if ->should_authenticate() thinks our keys are valid, we
    will busy loop, incrementing auth_retry to no avail:

    process_connect ffff880079a63830 got BADAUTHORIZER attempt 1
    process_connect ffff880079a63830 got BADAUTHORIZER attempt 2
    process_connect ffff880079a63830 got BADAUTHORIZER attempt 3
    process_connect ffff880079a63830 got BADAUTHORIZER attempt 4
    process_connect ffff880079a63830 got BADAUTHORIZER attempt 5
    ...

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • There are a number of problems with revoking a "was sending" message:

    (1) We never make any attempt to revoke data - only kvecs contibute to
    con->out_skip. However, once the header (envelope) is written to the
    socket, our peer learns data_len and sets itself to expect at least
    data_len bytes to follow front or front+middle. If ceph_msg_revoke()
    is called while the messenger is sending message's data portion,
    anything we send after that call is counted by the OSD towards the now
    revoked message's data portion. The effects vary, the most common one
    is the eventual hang - higher layers get stuck waiting for the reply to
    the message that was sent out after ceph_msg_revoke() returned and
    treated by the OSD as a bunch of data bytes. This is what Matt ran
    into.

    (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs
    is wrong. If ceph_msg_revoke() is called before the tag is sent out or
    while the messenger is sending the header, we will get a connection
    reset, either due to a bad tag (0 is not a valid tag) or a bad header
    CRC, which kind of defeats the purpose of revoke. Currently the kernel
    client refuses to work with header CRCs disabled, but that will likely
    change in the future, making this even worse.

    (3) con->out_skip is not reset on connection reset, leading to one or
    more spurious connection resets if we happen to get a real one between
    con->out_skip is set in ceph_msg_revoke() and before it's cleared in
    write_partial_skip().

    Fixing (1) and (3) is trivial. The idea behind fixing (2) is to never
    zero the tag or the header, i.e. send out tag+header regardless of when
    ceph_msg_revoke() is called. That way the header is always correct, no
    unnecessary resets are induced and revoke stands ready for disabled
    CRCs. Since ceph_msg_revoke() rips out con->out_msg, introduce a new
    "message out temp" and copy the header into it before sending.

    Cc: stable@vger.kernel.org # 4.0+
    Reported-by: Matt Conner
    Signed-off-by: Ilya Dryomov
    Tested-by: Matt Conner
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Use list_for_each_entry_safe() instead of list_for_each_safe() to
    simplify the code.

    Signed-off-by: Geliang Tang
    [idryomov@gmail.com: nuke call to list_splice_init() as well]
    Signed-off-by: Ilya Dryomov

    Geliang Tang
     
  • list_next_entry has been defined in list.h, so I replace list_entry_next
    with it.

    Signed-off-by: Geliang Tang
    Signed-off-by: Ilya Dryomov

    Geliang Tang
     

21 Jan, 2016

4 commits

  • tcp_memcontrol.c only contains legacy memory.tcp.kmem.* file definitions
    and mem_cgroup->tcp_mem init/destroy stuff. This doesn't belong to
    network subsys. Let's move it to memcontrol.c. This also allows us to
    reuse generic code for handling legacy memcg files.

    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Cc: "David S. Miller"
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Let the user know that CONFIG_MEMCG_KMEM does not apply to the cgroup2
    interface. This also makes legacy-only code sections stand out better.

    [arnd@arndb.de: mm: memcontrol: only manage socket pressure for CONFIG_INET]
    Signed-off-by: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Acked-by: Vladimir Davydov
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • This series adds accounting of the historical "kmem" memory consumers to
    the cgroup2 memory controller.

    These consumers include the dentry cache, the inode cache, kernel stack
    pages, and a few others that are pointed out in patch 7/8. The
    footprint of these consumers is directly tied to userspace activity in
    common workloads, and so they have to be part of the minimally viable
    configuration in order to present a complete feature to our users.

    The cgroup2 interface of the memory controller is far from complete, but
    this series, along with the socket memory accounting series, provides
    the final semantic changes for the existing memory knobs in the cgroup2
    interface, which is scheduled for initial release in the next merge
    window.

    This patch (of 8):

    Remove unused css argument frmo memcg_init_kmem()

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Acked-by: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • With upcoming CONFIG_UBSAN the following BUILD_BUG_ON in
    net/mac80211/debugfs.c starts to trigger:

    BUILD_BUG_ON(hw_flag_names[NUM_IEEE80211_HW_FLAGS] != (void *)0x1);

    It seems, that compiler instrumentation causes some code
    deoptimizations. Because of that GCC is not being able to resolve
    condition in BUILD_BUG_ON() at compile time.

    We could make size of hw_flag_names array unspecified and replace the
    condition in BUILD_BUG_ON() with following:

    ARRAY_SIZE(hw_flag_names) != NUM_IEEE80211_HW_FLAGS

    That will have the same effect as before (adding new flag without
    updating array will trigger build failure) except it doesn't fail with
    CONFIG_UBSAN. As a bonus this patch slightly decreases size of
    hw_flag_names array.

    Signed-off-by: Andrey Ryabinin
    Cc: Johannes Berg
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

20 Jan, 2016

11 commits

  • We now alwasy have a per-PD local_dma_lkey available. Make use of that
    fact in svc_rdma and stop registering our own MR.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Chuck Lever
    Reviewed-by: Steve Wise
    Acked-by: J. Bruce Fields
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • To support the server-side of an NFSv4.1 backchannel on RDMA
    connections, add a transport class that enables backward
    direction messages on an existing forward channel connection.

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     
  • Extra resources for handling backchannel requests have to be
    pre-allocated when a transport instance is created. Set up
    additional fields in svcxprt_rdma to track these resources.

    The max_requests fields are elements of the RPC-over-RDMA
    protocol, so they should be u32. To ensure that unsigned
    arithmetic is used everywhere, some other fields in the
    svcxprt_rdma struct are updated.

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     
  • Pre-requisite to use map_xdr in the backchannel code.

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     
  • Clean up.

    These functions can otherwise fail, so check for page allocation
    failures too.

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     
  • svc_rdma_post_recv() allocates pages for receive buffers on-demand.
    It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
    I'm about to add a call to svc_rdma_post_recv() from a function
    that may not sleep.

    Since all svc_rdma_post_recv() call sites can tolerate its failure,
    allow it to fail if the page allocator returns nothing. Longer term,
    receive buffers, being a finite resource per-connection, should be
    pre-allocated and re-used.

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     
  • Clean up.

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     
  • To ensure this allocation cannot fail and will not sleep,
    pre-allocate the req_map structures per-connection.

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     
  • When the maximum payload size of NFS READ and WRITE was increased
    by commit cc9a903d915c ("svcrdma: Change maximum server payload back
    to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
    increased to over 6KB (on x86_64). That makes allocating one of
    these from a kmem_cache more likely to fail in situations when
    system memory is exhausted.

    Since I'm about to add a caller where this allocation must always
    work _and_ it cannot sleep, pre-allocate ctxts for each connection.

    Another motivation for this change is that NFSv4.x servers are
    required by specification not to drop NFS requests. Pre-allocating
    memory resources reduces the likelihood of a drop.

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     
  • Be sure the completed ctxt is put in every path.

    The xprt enqueue can take a while, so put the completed ctxt back
    in circulation _before_ enqueuing the xprt.

    Remove/disable debugging.

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     
  • kzalloc is used here, so setting the atomic fields to zero is
    unnecessary. sc_ord is set again in handle_connect_req. The other
    fields are re-initialized in svc_rdma_accept().

    Signed-off-by: Chuck Lever
    Acked-by: Bruce Fields
    Signed-off-by: Doug Ledford

    Chuck Lever
     

19 Jan, 2016

1 commit

  • It was seen that defective configurations of openvswitch could overwrite
    the STACK_END_MAGIC and cause a hard crash of the kernel because of too
    many recursions within ovs.

    This problem arises due to the high stack usage of openvswitch. The rest
    of the kernel is fine with the current limit of 10 (RECURSION_LIMIT).

    We use the already existing recursion counter in ovs_execute_actions to
    implement an upper bound of 5 recursions.

    Cc: Pravin Shelar
    Cc: Simon Horman
    Cc: Eric Dumazet
    Cc: Simon Horman
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

18 Jan, 2016

1 commit

  • Re-establish the previous behavior and avoid hashing temporary asocs by
    checking t->asoc->temp in sctp_(un)hash_transport. Also, remove the
    check of t->asoc->temp in __sctp_lookup_association, since they are
    never hashed now.

    Fixes: 4f0087812648 ("sctp: apply rhashtable api to send/recv path")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Reported-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Xin Long
     

16 Jan, 2016

9 commits

  • It is not allowed to free the memory of an object which is part of a list
    which is protected by rcu-read-side-critical sections without making sure
    that no other context is accessing the object anymore. This usually happens
    by removing the references to this object and then waiting until the rcu
    grace period is over and no one (allowedly) accesses it anymore.

    But the _now functions ignore this completely. They free the object
    directly even when a different context still tries to access it. This has
    to be avoided and thus these functions must be removed and all functions
    have to use batadv_orig_node_free_ref.

    Fixes: 72822225bd41 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code")
    Signed-off-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Sven Eckelmann
     
  • It is not allowed to free the memory of an object which is part of a list
    which is protected by rcu-read-side-critical sections without making sure
    that no other context is accessing the object anymore. This usually happens
    by removing the references to this object and then waiting until the rcu
    grace period is over and no one (allowedly) accesses it anymore.

    But the _now functions ignore this completely. They free the object
    directly even when a different context still tries to access it. This has
    to be avoided and thus these functions must be removed and all functions
    have to use batadv_hardif_free_ref.

    Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct")
    Signed-off-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Sven Eckelmann
     
  • It is not allowed to free the memory of an object which is part of a list
    which is protected by rcu-read-side-critical sections without making sure
    that no other context is accessing the object anymore. This usually happens
    by removing the references to this object and then waiting until the rcu
    grace period is over and no one (allowedly) accesses it anymore.

    But the _now functions ignore this completely. They free the object
    directly even when a different context still tries to access it. This has
    to be avoided and thus these functions must be removed and all functions
    have to use batadv_neigh_ifinfo_free_ref.

    Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct")
    Signed-off-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Sven Eckelmann
     
  • It is not allowed to free the memory of an object which is part of a list
    which is protected by rcu-read-side-critical sections without making sure
    that no other context is accessing the object anymore. This usually happens
    by removing the references to this object and then waiting until the rcu
    grace period is over and no one (allowedly) accesses it anymore.

    But the _now functions ignore this completely. They free the object
    directly even when a different context still tries to access it. This has
    to be avoided and thus these functions must be removed and all functions
    have to use batadv_hardif_neigh_free_ref.

    Fixes: cef63419f7db ("batman-adv: add list of unique single hop neighbors per hard-interface")
    Signed-off-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Sven Eckelmann
     
  • It is not allowed to free the memory of an object which is part of a list
    which is protected by rcu-read-side-critical sections without making sure
    that no other context is accessing the object anymore. This usually happens
    by removing the references to this object and then waiting until the rcu
    grace period is over and no one (allowedly) accesses it anymore.

    But the _now functions ignore this completely. They free the object
    directly even when a different context still tries to access it. This has
    to be avoided and thus these functions must be removed and all functions
    have to use batadv_neigh_node_free_ref.

    Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct")
    Signed-off-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Sven Eckelmann
     
  • It is not allowed to free the memory of an object which is part of a list
    which is protected by rcu-read-side-critical sections without making sure
    that no other context is accessing the object anymore. This usually happens
    by removing the references to this object and then waiting until the rcu
    grace period is over and no one (allowedly) accesses it anymore.

    But the _now functions ignore this completely. They free the object
    directly even when a different context still tries to access it. This has
    to be avoided and thus these functions must be removed and all functions
    have to use batadv_orig_ifinfo_free_ref.

    Fixes: 7351a4822d42 ("batman-adv: split out router from orig_node")
    Signed-off-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Sven Eckelmann
     
  • The batadv_nc_node_free_ref function uses call_rcu to delay the free of the
    batadv_nc_node object until no (already started) rcu_read_lock is enabled
    anymore. This makes sure that no context is still trying to access the
    object which should be removed. But batadv_nc_node also contains a
    reference to orig_node which must be removed.

    The reference drop of orig_node was done in the call_rcu function
    batadv_nc_node_free_rcu but should actually be done in the
    batadv_nc_node_release function to avoid nested call_rcus. This is
    important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will
    not detect the inner call_rcu as relevant for its execution. Otherwise this
    barrier will most likely be inserted in the queue before the callback of
    the first call_rcu was executed. The caller of rcu_barrier will therefore
    continue to run before the inner call_rcu callback finished.

    Fixes: d56b1705e28c ("batman-adv: network coding - detect coding nodes and remove these after timeout")
    Signed-off-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Sven Eckelmann
     
  • The batadv_claim_free_ref function uses call_rcu to delay the free of the
    batadv_bla_claim object until no (already started) rcu_read_lock is enabled
    anymore. This makes sure that no context is still trying to access the
    object which should be removed. But batadv_bla_claim also contains a
    reference to backbone_gw which must be removed.

    The reference drop of backbone_gw was done in the call_rcu function
    batadv_claim_free_rcu but should actually be done in the
    batadv_claim_release function to avoid nested call_rcus. This is important
    because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not
    detect the inner call_rcu as relevant for its execution. Otherwise this
    barrier will most likely be inserted in the queue before the callback of
    the first call_rcu was executed. The caller of rcu_barrier will therefore
    continue to run before the inner call_rcu callback finished.

    Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
    Signed-off-by: Sven Eckelmann
    Acked-by: Simon Wunderlich
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Sven Eckelmann
     
  • Pull networking fixes from David Miller:
    "A quick set of bug fixes after there initial networking merge:

    1) Netlink multicast group storage allocator only was tested with
    nr_groups equal to 1, make it work for other values too. From
    Matti Vaittinen.

    2) Check build_skb() return value in macb and hip04_eth drivers, from
    Weidong Wang.

    3) Don't leak x25_asy on x25_asy_open() failure.

    4) More DMA map/unmap fixes in 3c59x from Neil Horman.

    5) Don't clobber IP skb control block during GSO segmentation, from
    Konstantin Khlebnikov.

    6) ECN helpers for ipv6 don't fixup the checksum, from Eric Dumazet.

    7) Fix SKB segment utilization estimation in xen-netback, from David
    Vrabel.

    8) Fix lockdep splat in bridge addrlist handling, from Nikolay
    Aleksandrov"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
    bgmac: Fix reversed test of build_skb() return value.
    bridge: fix lockdep addr_list_lock false positive splat
    net: smsc: Add support h8300
    xen-netback: free queues after freeing the net device
    xen-netback: delete NAPI instance when queue fails to initialize
    xen-netback: use skb to determine number of required guest Rx requests
    net: sctp: Move sequence start handling into sctp_transport_get_idx()
    ipv6: update skb->csum when CE mark is propagated
    net: phy: turn carrier off on phy attach
    net: macb: clear interrupts when disabling them
    sctp: support to lookup with ep+paddr in transport rhashtable
    net: hns: fixes no syscon error when init mdio
    dts: hisi: fixes no syscon fault when init mdio
    net: preserve IP control block during GSO segmentation
    fsl/fman: Delete one function call "put_device" in dtsec_config()
    hip04_eth: fix missing error handle for build_skb failed
    3c59x: fix another page map/single unmap imbalance
    3c59x: balance page maps and unmaps
    x25_asy: Free x25_asy on x25_asy_open() failure.
    mlxsw: fix SWITCHDEV_OBJ_ID_PORT_MDB
    ...

    Linus Torvalds