05 Jan, 2016

2 commits

  • On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
    system call and AF_UNIX sockets,

    http://lists.openwall.net/netdev/2015/11/06/24

    The situation was analyzed as

    (a while ago) A: socketpair()
    B: splice() from a pipe to /mnt/regular_file
    does sb_start_write() on /mnt
    C: try to freeze /mnt
    wait for B to finish with /mnt
    A: bind() try to bind our socket to /mnt/new_socket_name
    lock our socket, see it not bound yet
    decide that it needs to create something in /mnt
    try to do sb_start_write() on /mnt, block (it's
    waiting for C).
    D: splice() from the same pipe to our socket
    lock the pipe, see that socket is connected
    try to lock the socket, block waiting for A
    B: get around to actually feeding a chunk from
    pipe to file, try to lock the pipe. Deadlock.

    on 2015/11/10 by Al Viro,

    http://lists.openwall.net/netdev/2015/11/10/4

    The patch fixes this by removing the kern_path_create related code from
    unix_mknod and executing it as part of unix_bind prior acquiring the
    readlock of the socket in question. This means that A (as used above)
    will sb_start_write on /mnt before it acquires the readlock, hence, it
    won't indirectly block B which first did a sb_start_write and then
    waited for a thread trying to acquire the readlock. Consequently, A
    being blocked by C waiting for B won't cause a deadlock anymore
    (effectively, both A and B acquire two locks in opposite order in the
    situation described above).

    Dmitry Vyukov() tested the original patch.

    Signed-off-by: Rainer Weikusat
    Signed-off-by: David S. Miller

    Rainer Weikusat
     
  • Commands run in a vrf context are not failing as expected on a route lookup:
    root@kenny:~# ip ro ls table vrf-red
    unreachable default

    root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
    ping: Warning: source address might be selected on device other than vrf-red.
    PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data.

    --- 10.100.1.254 ping statistics ---
    2 packets transmitted, 0 received, 100% packet loss, time 999ms

    Since the vrf table does not have a route for 10.100.1.254 the ping
    should have failed. The saddr lookup causes a full VRF table lookup.
    Propogating a lookup failure to the user allows the command to fail as
    expected:

    root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
    connect: No route to host

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

31 Dec, 2015

2 commits

  • In sctp_close, sctp_make_abort_user may return NULL because of memory
    allocation failure. If this happens, it will bypass any state change
    and never free the assoc. The assoc has no chance to be freed and it
    will be kept in memory with the state it had even after the socket is
    closed by sctp_close().

    So if sctp_make_abort_user fails to allocate memory, we should abort
    the asoc via sctp_primitive_ABORT as well. Just like the annotation in
    sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
    "Even if we can't send the ABORT due to low memory delete the TCB.
    This is a departure from our typical NOMEM handling".

    But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
    dereference the chunk pointer, and system crash. So we should add
    SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
    places where it adds SCTP_CMD_REPLY cmd.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • Commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") from
    the current 4.4 release cycle introduced a new flags member in
    struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
    from struct socket's flags member into that new place.

    Unfortunately, the new flags field is never initialized properly, at least
    not for the struct socket_wq instance created in sock_alloc_inode().

    One particular issue I encountered because of this is that my GNU Emacs
    failed to draw anything on my desktop -- i.e. what I got is a transparent
    window, including the title bar. Bisection lead to the commit mentioned
    above and further investigation by means of strace told me that Emacs
    is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
    reproducible 100% of times and the fact that properly initializing the
    struct socket_wq ->flags fixes the issue leads me to the conclusion that
    somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags,
    preventing my Emacs from receiving any SIGIO's due to data becoming
    available and it got stuck.

    Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
    member to zero.

    Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection")
    Signed-off-by: Nicolai Stange
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Nicolai Stange
     

30 Dec, 2015

1 commit

  • Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a
    reference leak on helper objects, but inadvertently introduced a leak on
    the ct template.

    Previously, ct_info.ct->general.use was initialized to 0 by
    nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
    returned successful. If an error occurred while adding the helper or
    adding the action to the actions buffer, the __ovs_ct_free_action()
    cleanup would use nf_ct_put() to free the entry; However, this relies on
    atomic_dec_and_test(ct_info.ct->general.use). This reference must be
    incremented first, or nf_ct_put() will never free it.

    Fix the issue by acquiring a reference to the template immediately after
    allocation.

    Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
    Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak")
    Signed-off-by: Joe Stringer
    Signed-off-by: David S. Miller

    Joe Stringer
     

28 Dec, 2015

2 commits

  • Accepted or peeled off sockets were missing a security label (e.g.
    SELinux) which means that socket was in "unlabeled" state.

    This patch clones the sock's label from the parent sock and resolves the
    issue (similar to AF_BLUETOOTH protocol family).

    Cc: Paul Moore
    Cc: David Teigland
    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Paul Moore
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • Commit cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc")
    missed two other spots.

    For connectx, as it's more likely to be used by kernel users of the API,
    it detects if GFP_USER should be used or not.

    Fixes: cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc")
    Reported-by: Dmitry Vyukov
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

24 Dec, 2015

1 commit

  • Marc Haber reported we don't honor interface indexes when we receive link
    local router addresses in router advertisements. Luckily the non-strict
    version of ipv6_chk_addr already does the correct job here, so we can
    simply use it to lighten the checks and use those addresses by default
    without any configuration change.

    Link:
    Reported-by: Marc Haber
    Cc: Marc Haber
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

23 Dec, 2015

5 commits

  • When sysctl performs restrict writes, it allows to write from
    a middle position of a sysctl file, which requires us to initialize
    the table data before calling proc_dostring() for the write case.

    Fixes: 3d1bec99320d ("ipv6: introduce secret_stable to ipv6_devconf")
    Reported-by: Sasha Levin
    Acked-by: Hannes Frederic Sowa
    Tested-by: Sasha Levin
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2015-12-22

    Just one patch to fix dst_entries_init with multiple namespaces.
    From Dan Streetman.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
    ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
    ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.

    Fix this by inverting ip6addrlbl_hold() check.

    Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
    Signed-off-by: Andrey Ryabinin
    Reviewed-by: Cong Wang
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Andrey Ryabinin
     
  • The bridge's ageing time is offloaded to hardware when:
    1) A port joins a bridge
    2) The ageing time of the bridge is changed

    In the first case the ageing time is offloaded as jiffies, but in the
    second case it's offloaded as clock_t, which is what existing switchdev
    drivers expect to receive.

    Fixes: 6ac311ae8bfb ("Adding switchdev ageing notification on port bridged")
    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains two netfilter fixes:

    1) Oneliner from Florian to dump missing NFT_CT_L3PROTOCOL netlink
    attribute, from Florian Westphal.

    2) Another oneliner for nf_tables to use skb->protocol from the new
    netdev family, we can't assume ethernet there.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Dec, 2015

2 commits


18 Dec, 2015

6 commits

  • one nft userspace test case fails with

    'ct l3proto original ipv4' mismatches 'ct l3proto ipv4'

    ... because NFTA_CT_DIRECTION attr is missing.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Otherwise we may end up with incorrect network and transport header for
    other protocols.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Pull networking fixes from David Miller:

    1) Fix uninitialized variable warnings in nfnetlink_queue, a lot of
    people reported this... From Arnd Bergmann.

    2) Don't init mutex twice in i40e driver, from Jesse Brandeburg.

    3) Fix spurious EBUSY in rhashtable, from Herbert Xu.

    4) Missing DMA unmaps in mvpp2 driver, from Marcin Wojtas.

    5) Fix race with work structure access in pppoe driver causing
    corruptions, from Guillaume Nault.

    6) Fix OOPS due to sh_eth_rx() not checking whether netdev_alloc_skb()
    actually succeeded or not, from Sergei Shtylyov.

    7) Don't lose flags when settifn IFA_F_OPTIMISTIC in ipv6 code, from
    Bjørn Mork.

    8) VXLAN_HD_RCO defined incorrectly, fix from Jiri Benc.

    9) Fix clock source used for cookies in SCTP, from Marcelo Ricardo
    Leitner.

    10) aurora driver needs HAS_DMA dependency, from Geert Uytterhoeven.

    11) ndo_fill_metadata_dst op of vxlan has to handle ipv6 tunneling
    properly as well, from Jiri Benc.

    12) Handle request sockets properly in xfrm layer, from Eric Dumazet.

    13) Double stats update in ipv6 geneve transmit path, fix from Pravin B
    Shelar.

    14) sk->sk_policy[] needs RCU protection, and as a result
    xfrm_policy_destroy() needs to free policies using an RCU grace
    period, from Eric Dumazet.

    15) SCTP needs to clone ipv6 tx options in order to avoid use after
    free, from Eric Dumazet.

    16) Missing kbuild export if ila.h, from Stephen Hemminger.

    17) Missing mdiobus_alloc() return value checking in mdio-mux.c, from
    Tobias Klauser.

    18) Validate protocol value range in ->create() methods, from Hannes
    Frederic Sowa.

    19) Fix early socket demux races that result in illegal dst reuse, from
    Eric Dumazet.

    20) Validate socket address length in pptp code, from WANG Cong.

    21) skb_reorder_vlan_header() uses incorrect offset and can corrupt
    packets, from Vlad Yasevich.

    22) Fix memory leaks in nl80211 registry code, from Ola Olsson.

    23) Timeout loop count handing fixes in mISDN, xgbe, qlge, sfc, and
    qlcnic. From Dan Carpenter.

    24) msg.msg_iocb needs to be cleared in recvfrom() otherwise, for
    example, AF_ALG will interpret it as an async call. From Tadeusz
    Struk.

    25) inetpeer_set_addr_v4 forgets to initialize the 'vif' field, from
    Eric Dumazet.

    26) rhashtable enforces the minimum table size not early enough,
    breaking how we calculate the per-cpu lock allocations. From
    Herbert Xu.

    27) Fix FCC port lockup in 82xx driver, from Martin Roth.

    28) FOU sockets need to be freed using RCU, from Hannes Frederic Sowa.

    29) Fix out-of-bounds access in __skb_complete_tx_timestamp() and
    sock_setsockopt() wrt. timestamp handling. From WANG Cong.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (117 commits)
    net: check both type and procotol for tcp sockets
    drivers: net: xgene: fix Tx flow control
    tcp: restore fastopen with no data in SYN packet
    af_unix: Revert 'lock_interruptible' in stream receive code
    fou: clean up socket with kfree_rcu
    82xx: FCC: Fixing a bug causing to FCC port lock-up
    gianfar: Don't enable RX Filer if not supported
    net: fix warnings in 'make htmldocs' by moving macro definition out of field declaration
    rhashtable: Fix walker list corruption
    rhashtable: Enforce minimum size on initial hash table
    inet: tcp: fix inetpeer_set_addr_v4()
    ipv6: automatically enable stable privacy mode if stable_secret set
    net: fix uninitialized variable issue
    bluetooth: Validate socket address length in sco_sock_bind().
    net_sched: make qdisc_tree_decrease_qlen() work for non mq
    ser_gigaset: remove unnecessary kfree() calls from release method
    ser_gigaset: fix deallocation of platform device structure
    ser_gigaset: turn nonsense checks into WARN_ON
    ser_gigaset: fix up NULL checks
    qlcnic: fix a timeout loop
    ...

    Linus Torvalds
     
  • Dmitry reported the following out-of-bound access:

    Call Trace:
    [] __asan_report_load4_noabort+0x3e/0x40
    mm/kasan/report.c:294
    [] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
    [< inline >] SYSC_setsockopt net/socket.c:1746
    [] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
    [] entry_SYSCALL_64_fastpath+0x16/0x7a
    arch/x86/entry/entry_64.S:185

    This is because we mistake a raw socket as a tcp socket.
    We should check both sk->sk_type and sk->sk_protocol to ensure
    it is a tcp socket.

    Willem points out __skb_complete_tx_timestamp() needs to fix as well.

    Reported-by: Dmitry Vyukov
    Cc: Willem de Bruijn
    Cc: Eric Dumazet
    Signed-off-by: Cong Wang
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Yuchung tracked a regression caused by commit 57be5bdad759 ("ip: convert
    tcp_sendmsg() to iov_iter primitives") for TCP Fast Open.

    Some Fast Open users do not actually add any data in the SYN packet.

    Fixes: 57be5bdad759 ("ip: convert tcp_sendmsg() to iov_iter primitives")
    Reported-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Cc: Al Viro
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
    receive code was changed from using mutex_lock(&u->readlock) to
    mutex_lock_interruptible(&u->readlock) to prevent signals from being
    delayed for an indefinite time if a thread sleeping on the mutex
    happened to be selected for handling the signal. But this was never a
    problem with the stream receive code (as opposed to its datagram
    counterpart) as that never went to sleep waiting for new messages with the
    mutex held and thus, wouldn't cause secondary readers to block on the
    mutex waiting for the sleeping primary reader. As the interruptible
    locking makes the code more complicated in exchange for no benefit,
    change it back to using mutex_lock.

    Signed-off-by: Rainer Weikusat
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Rainer Weikusat
     

17 Dec, 2015

2 commits

  • fou->udp_offloads is managed by RCU. As it is actually included inside
    the fou sockets, we cannot let the memory go out of scope before a grace
    period. We either can synchronize_rcu or switch over to kfree_rcu to
    manage the sockets. kfree_rcu seems appropriate as it is used by vxlan
    and geneve.

    Fixes: 23461551c00628c ("fou: Support for foo-over-udp RX path")
    Cc: Tom Herbert
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    Another set of fixes:
    * memory leak fixes (from Ola)
    * operating mode notification spec compliance fix (from Eyal)
    * copy rfkill names in case pointer becomes invalid (myself)
    * two hardware restart fixes (myself)
    * get rid of "limiting TX power" log spam (myself)
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

16 Dec, 2015

4 commits


15 Dec, 2015

11 commits

  • An AP can send an operating channel width change in a beacon
    opmode notification IE as long as there's a change in the nss as
    well (See 802.11ac-2013 section 10.41).
    So don't limit updating to nss only from an opmode notification IE.

    Signed-off-by: Eyal Shapira
    Signed-off-by: Emmanuel Grumbach
    Signed-off-by: Johannes Berg

    Eyal Shapira
     
  • When the AP is advertising limited TX power, the message can be
    printed over and over again. Suppress it when the power level
    isn't changing.

    This fixes https://bugzilla.kernel.org/show_bug.cgi?id=106011

    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • During reprogramming, mac80211 currently first adds all the channel
    contexts, then binds them to the vifs and then goes to reconfigure
    all the interfaces. Drivers might, perhaps implicitly, rely on the
    operation order for certain things that typically happen within a
    single function elsewhere in mac80211. To avoid problems with that,
    reorder the code in mac80211's restart/reprogramming to work fully
    within the interface loop so that the order of operations is like
    in normal operation.

    For iwlwifi, this fixes a firmware crash when reprogramming with an
    AP/GO interface active.

    Reported-by: David Spinadel
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • When reconfiguration during resume fails while a scan is pending
    for completion work, that work will never run, and the scan will
    be stuck forever. Factor out the code to recover this and call it
    also in ieee80211_handle_reconfig_failure().

    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • Free cached keys if the last early return path is taken.

    Signed-off-by: Ola Olsson
    Signed-off-by: Johannes Berg

    Ola Olsson
     
  • Compared to cfg80211_rdev_free_wowlan in core.h,
    the error goto label lacks the freeing of nd_config.
    Fix that.

    Signed-off-by: Ola Olsson
    Signed-off-by: Johannes Berg

    Ola Olsson
     
  • The first leak occurs when entering the default case
    in the switch for the initiator in set_regdom.
    The second leaks a platform_device struct if the
    platform registration in regulatory_init succeeds but
    the sub sequent regulatory hint fails due to no memory.

    Signed-off-by: Ola Olsson
    Signed-off-by: Johannes Berg

    Ola Olsson
     
  • skb_reorder_vlan_header is called after the vlan header has
    been pulled. As a result the offset of the begining of
    the mac header has been incrased by 4 bytes (VLAN_HLEN).
    When moving the mac addresses, include this incrase in
    the offset calcualation so that the mac addresses are
    copied correctly.

    Fixes: a6e18ff1117 (vlan: Fix untag operations of stacked vlans with REORDER_HEADER off)
    CC: Nicolas Dichtel
    CC: Patrick McHardy
    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • David Wilder reported crashes caused by dst reuse.

    I am seeing a crash on a distro V4.2.3 kernel caused by a double
    release of a dst_entry. In ipv4_dst_destroy() the call to
    list_empty() finds a poisoned next pointer, indicating the dst_entry
    has already been removed from the list and freed. The crash occurs
    18 to 24 hours into a run of a network stress exerciser.

    Thanks to his detailed report and analysis, we were able to understand
    the core issue.

    IP early demux can associate a dst to skb, after a lookup in TCP/UDP
    sockets.

    When socket cache is not properly set, we want to store into
    sk->sk_dst_cache the dst for future IP early demux lookups,
    by acquiring a stable refcount on the dst.

    Problem is this acquisition is simply using an atomic_inc(),
    which works well, unless the dst was queued for destruction from
    dst_release() noticing dst refcount went to zero, if DST_NOCACHE
    was set on dst.

    We need to make sure current refcount is not zero before incrementing
    it, or risk double free as David reported.

    This patch, being a stable candidate, adds two new helpers, and use
    them only from IP early demux problematic paths.

    It might be possible to merge in net-next skb_dst_force() and
    skb_dst_force_safe(), but I prefer having the smallest patch for stable
    kernels : Maybe some skb_dst_force() callers do not expect skb->dst
    can suddenly be cleared.

    Can probably be backported back to linux-3.6 kernels

    Reported-by: David J. Wilder
    Tested-by: David J. Wilder
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • 郭永刚 reported that one could simply crash the kernel as root by
    using a simple program:

    int socket_fd;
    struct sockaddr_in addr;
    addr.sin_port = 0;
    addr.sin_addr.s_addr = INADDR_ANY;
    addr.sin_family = 10;

    socket_fd = socket(10,3,0x40000000);
    connect(socket_fd , &addr,16);

    AF_INET, AF_INET6 sockets actually only support 8-bit protocol
    identifiers. inet_sock's skc_protocol field thus is sized accordingly,
    thus larger protocol identifiers simply cut off the higher bits and
    store a zero in the protocol fields.

    This could lead to e.g. NULL function pointer because as a result of
    the cut off inet_num is zero and we call down to inet_autobind, which
    is NULL for raw sockets.

    kernel: Call Trace:
    kernel: [] ? inet_autobind+0x2e/0x70
    kernel: [] inet_dgram_connect+0x54/0x80
    kernel: [] SYSC_connect+0xd9/0x110
    kernel: [] ? ptrace_notify+0x5b/0x80
    kernel: [] ? syscall_trace_enter_phase2+0x108/0x200
    kernel: [] SyS_connect+0xe/0x10
    kernel: [] tracesys_phase2+0x84/0x89

    I found no particular commit which introduced this problem.

    CVE: CVE-2015-8543
    Cc: Cong Wang
    Reported-by: 郭永刚
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Pablo Neira Ayuso says:

    ====================
    netfilter fixes for net

    The following patchset contains Netfilter fixes for you net tree,
    specifically for nf_tables and nfnetlink_queue, they are:

    1) Avoid a compilation warning in nfnetlink_queue that was introduced
    in the previous merge window with the simplification of the conntrack
    integration, from Arnd Bergmann.

    2) nfnetlink_queue is leaking the pernet subsystem registration from
    a failure path, patch from Nikolay Borisov.

    3) Pass down netns pointer to batch callback in nfnetlink, this is the
    largest patch and it is not a bugfix but it is a dependency to
    resolve a splat in the correct way.

    4) Fix a splat due to incorrect socket memory accounting with nfnetlink
    skbuff clones.

    5) Add missing conntrack dependencies to NFT_DUP_IPV4 and NFT_DUP_IPV6.

    6) Traverse the nftables commit list in reverse order from the commit
    path, otherwise we crash when the user applies an incremental update
    via 'nft -f' that deletes an object that was just introduced in this
    batch, from Xin Long.

    Regarding the compilation warning fix, many people have sent us (and
    keep sending us) patches to address this, that's why I'm including this
    batch even if this is not critical.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

14 Dec, 2015

2 commits

  • The VRF driver cycles netdevs when an interface is enslaved or released:
    the down event is used to flush neighbor and route tables and the up
    event (if the interface was already up) effectively moves local and
    connected routes to the proper table.

    As of 4f823defdd5b the local route is left hanging around after a link
    down, so when a netdev is moved from one VRF to another (or released
    from a VRF altogether) local routes are left in the wrong table.

    Fix by handling the NETDEV_CHANGEUPPER event. When the upper dev is
    an L3mdev then call fib_disable_ip to flush all routes, local ones
    to.

    Fixes: 4f823defdd5b ("ipv4: fix to not remove local route on link down")
    Cc: Julian Anastasov
    Signed-off-by: David Ahern
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    David Ahern
     
  • Jan Stancek reported that I wrecked things for him by fixing things for
    Vladimir :/

    His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
    should not be possible, however my previous patch made this possible by
    unconditionally checking signal_pending().

    We cannot use current->state as was done previously, because the
    instruction after the store to that variable it can be changed. We must
    instead pass the initial state along and use that.

    Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers")
    Reported-by: Jan Stancek
    Reported-by: Chris Mason
    Tested-by: Jan Stancek
    Tested-by: Vladimir Murzin
    Tested-by: Chris Mason
    Reviewed-by: Paul Turner
    Cc: Ingo Molnar
    Cc: tglx@linutronix.de
    Cc: Oleg Nesterov
    Cc: hpa@zytor.com
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Linus Torvalds

    Peter Zijlstra