22 Nov, 2014

3 commits

  • Pull networking fixes from David Miller:

    1) Fix BUG when decrypting empty packets in mac80211, from Ronald Wahl.

    2) nf_nat_range is not fully initialized and this is copied back to
    userspace, from Daniel Borkmann.

    3) Fix read past end of b uffer in netfilter ipset, also from Dan
    Carpenter.

    4) Signed integer overflow in ipv4 address mask creation helper
    inet_make_mask(), from Vincent BENAYOUN.

    5) VXLAN, be2net, mlx4_en, and qlcnic need ->ndo_gso_check() methods to
    properly describe the device's capabilities, from Joe Stringer.

    6) Fix memory leaks and checksum miscalculations in openvswitch, from
    Pravin B SHelar and Jesse Gross.

    7) FIB rules passes back ambiguous error code for unreachable routes,
    making behavior confusing for userspace. Fix from Panu Matilainen.

    8) ieee802154fake_probe() doesn't release resources properly on error,
    from Alexey Khoroshilov.

    9) Fix skb_over_panic in add_grhead(), from Daniel Borkmann.

    10) Fix access of stale slave pointers in bonding code, from Nikolay
    Aleksandrov.

    11) Fix stack info leak in PPP pptp code, from Mathias Krause.

    12) Cure locking bug in IPX stack, from Jiri Bohac.

    13) Revert SKB fclone memory freeing optimization that is racey and can
    allow accesses to freed up memory, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (71 commits)
    tcp: Restore RFC5961-compliant behavior for SYN packets
    net: Revert "net: avoid one atomic operation in skb_clone()"
    virtio-net: validate features during probe
    cxgb4 : Fix DCB priority groups being returned in wrong order
    ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
    openvswitch: Don't validate IPv6 label masks.
    pptp: fix stack info leak in pptp_getname()
    brcmfmac: don't include linux/unaligned/access_ok.h
    cxgb4i : Don't block unload/cxgb4 unload when remote closes TCP connection
    ipv6: delete protocol and unregister rtnetlink when cleanup
    net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
    bonding: fix curr_active_slave/carrier with loadbalance arp monitoring
    mac80211: minstrel_ht: fix a crash in rate sorting
    vxlan: Inline vxlan_gso_check().
    can: m_can: update to support CAN FD features
    can: m_can: fix incorrect error messages
    can: m_can: add missing delay after setting CCCR_INIT bit
    can: m_can: fix not set can_dlc for remote frame
    can: m_can: fix possible sleep in napi poll
    can: m_can: add missing message RAM initialization
    ...

    Linus Torvalds
     
  • Commit c3ae62af8e755 ("tcp: should drop incoming frames without ACK
    flag set") was created to mitigate a security vulnerability in which a
    local attacker is able to inject data into locally-opened sockets by
    using TCP protocol statistics in procfs to quickly find the correct
    sequence number.

    This broke the RFC5961 requirement to send a challenge ACK in response
    to spurious RST packets, which was subsequently fixed by commit
    7b514a886ba50 ("tcp: accept RST without ACK flag").

    Unfortunately, the RFC5961 requirement that spurious SYN packets be
    handled in a similar manner remains broken.

    RFC5961 section 4 states that:

    ... the handling of the SYN in the synchronized state SHOULD be
    performed as follows:

    1) If the SYN bit is set, irrespective of the sequence number, TCP
    MUST send an ACK (also referred to as challenge ACK) to the remote
    peer:

    After sending the acknowledgment, TCP MUST drop the unacceptable
    segment and stop processing further.

    By sending an ACK, the remote peer is challenged to confirm the loss
    of the previous connection and the request to start a new connection.
    A legitimate peer, after restart, would not have a TCB in the
    synchronized state. Thus, when the ACK arrives, the peer should send
    a RST segment back with the sequence number derived from the ACK
    field that caused the RST.

    This RST will confirm that the remote peer has indeed closed the
    previous connection. Upon receipt of a valid RST, the local TCP
    endpoint MUST terminate its connection. The local TCP endpoint
    should then rely on SYN retransmission from the remote end to
    re-establish the connection.

    This patch lets SYN packets through the discard added in c3ae62af8e755,
    so that spurious SYN packets are properly dealt with as per the RFC.

    The challenge ACK is sent unconditionally and is rate-limited, so the
    original vulnerability is not reintroduced by this patch.

    Signed-off-by: Calvin Owens
    Acked-by: Eric Dumazet
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Calvin Owens
     
  • Not sure what I was thinking, but doing anything after
    releasing a refcount is suicidal or/and embarrassing.

    By the time we set skb->fclone to SKB_FCLONE_FREE, another cpu
    could have released last reference and freed whole skb.

    We potentially corrupt memory or trap if CONFIG_DEBUG_PAGEALLOC is set.

    Reported-by: Chris Mason
    Fixes: ce1a4ea3f1258 ("net: avoid one atomic operation in skb_clone()")
    Signed-off-by: Eric Dumazet
    Cc: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Nov, 2014

4 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains two bugfixes for your net tree, they are:

    1) Validate netlink group from nfnetlink to avoid an out of bound array
    access. This should only happen with superuser priviledges though.
    Discovered by Andrey Ryabinin using trinity.

    2) Don't push ethernet header before calling the netfilter output hook
    for multicast traffic, this breaks ebtables since it expects to see
    skb->data pointing to the network header, patch from Linus Luessing.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • John W. Linville says:

    ====================
    pull request: wireless 2014-11-20

    Please full this little batch of fixes intended for the 3.18 stream!

    For the mac80211 patch, Johannes says:

    "Here's another last minute fix, for minstrel HT crashing
    depending on the value of some uninitialised stack."

    On top of that...

    Ben Greear fixes an ath9k regression in which a BSSID mask is
    miscalculated.

    Dmitry Torokhov corrects an error handling routing in brcmfmac which
    was checking an unsigned variable for a negative value.

    Johannes Berg avoids a build problem in brcmfmac for arches where
    linux/unaligned/access_ok.h and asm/unaligned.h conflict.

    Mathy Vanhoef addresses another brcmfmac issue so as to eliminate a
    use-after-free of the URB transfer buffer if a timeout occurs.

    Please let me know if there are problems!
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This fixes an old regression introduced by commit
    b0d0d915 (ipx: remove the BKL).

    When a recvmsg syscall blocks waiting for new data, no data can be sent on the
    same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.

    This breaks mars-nwe (NetWare emulator):
    - the ncpserv process reads the request using recvmsg
    - ncpserv forks and spawns nwconn
    - ncpserv calls a (blocking) recvmsg and waits for new requests
    - nwconn deadlocks in sendmsg on the same socket

    Commit b0d0d915 has simply replaced BKL locking with
    lock_sock/release_sock. Unlike now, BKL got unlocked while
    sleeping, so a blocking recvmsg did not block a concurrent
    sendmsg.

    Only keep the socket locked while actually working with the socket data and
    release it prior to calling skb_recv_datagram().

    Signed-off-by: Jiri Bohac
    Reviewed-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Jiri Bohac
     
  • When userspace doesn't provide a mask, OVS datapath generates a fully
    unwildcarded mask for the flow by copying the flow and setting all bits
    in all fields. For IPv6 label, this creates a mask that matches on the
    upper 12 bits, causing the following error:

    openvswitch: netlink: Invalid IPv6 flow label value (value=ffffffff, max=fffff)

    This patch ignores the label validation check for masks, avoiding this
    error.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

20 Nov, 2014

2 commits


19 Nov, 2014

1 commit

  • The commit 5935839ad73583781b8bbe8d91412f6826e218a4
    "mac80211: improve minstrel_ht rate sorting by throughput & probability"

    introduced a crash on rate sorting that occurs when the rate added to
    the sorting array is faster than all the previous rates. Due to an
    off-by-one error, it reads the rate index from tp_list[-1], which
    contains uninitialized stack garbage, and then uses the resulting index
    for accessing the group rate stats, leading to a crash if the garbage
    value is big enough.

    Cc: Thomas Huehn
    Reported-by: Jouni Malinen
    Signed-off-by: Felix Fietkau
    Signed-off-by: Johannes Berg

    Felix Fietkau
     

17 Nov, 2014

7 commits

  • Ebtables on the OUTPUT chain (NF_BR_LOCAL_OUT) would not work as expected
    for both locally generated IGMP and MLD queries. The IP header specific
    filter options are off by 14 Bytes for netfilter (actual output on
    interfaces is fine).

    NF_HOOK() expects the skb->data to point to the IP header, not the
    ethernet one (while dev_queue_xmit() does not). Luckily there is an
    br_dev_queue_push_xmit() helper function already - let's just use that.

    Introduced by eb1d16414339a6e113d89e2cca2556005d7ce919
    ("bridge: Add core IGMP snooping support")

    Ebtables example:

    $ ebtables -I OUTPUT -p IPv6 -o eth1 --logical-out br0 \
    --log --log-level 6 --log-ip6 --log-prefix="~EBT: " -j DROP

    before (broken):

    ~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
    MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
    SRC=64a4:39c2:86dd:6000:0000:0020:0001:fe80 IPv6 \
    DST=0000:0000:0000:0004:64ff:fea4:39c2:ff02, \
    IPv6 priority=0x3, Next Header=2

    after (working):

    ~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
    MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
    SRC=fe80:0000:0000:0000:0004:64ff:fea4:39c2 IPv6 \
    DST=ff02:0000:0000:0000:0000:0000:0000:0001, \
    IPv6 priority=0x0, Next Header=0

    Signed-off-by: Linus Lüssing
    Acked-by: Herbert Xu
    Signed-off-by: Pablo Neira Ayuso

    Linus Lüssing
     
  • Make sure the netlink group exists, otherwise you can trigger an out
    of bound array memory access from the netlink_bind() path. This splat
    can only be triggered only by superuser.

    [ 180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
    [ 180.204249] index 9 is out of range for type 'int [9]'
    [ 180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
    [ 180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
    +04/01/2014
    [ 180.206498] 0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
    [ 180.207220] ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
    [ 180.207887] ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
    [ 180.208639] Call Trace:
    [ 180.208857] dump_stack (lib/dump_stack.c:52)
    [ 180.209370] ubsan_epilogue (lib/ubsan.c:174)
    [ 180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
    [ 180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
    [ 180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
    [ 180.211495] SYSC_bind (net/socket.c:1541)

    Moreover, define the missing nf_tables and nf_acct multicast groups too.

    Reported-by: Andrey Ryabinin
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • It has been reported that generating an MLD listener report on
    devices with large MTUs (e.g. 9000) and a high number of IPv6
    addresses can trigger a skb_over_panic():

    skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
    head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
    dev:port1
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:100!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: ixgbe(O)
    CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
    [...]
    Call Trace:

    [] ? skb_put+0x3a/0x3b
    [] ? add_grhead+0x45/0x8e
    [] ? add_grec+0x394/0x3d4
    [] ? mld_ifc_timer_expire+0x195/0x20d
    [] ? mld_dad_timer_expire+0x45/0x45
    [] ? call_timer_fn.isra.29+0x12/0x68
    [] ? run_timer_softirq+0x163/0x182
    [] ? __do_softirq+0xe0/0x21d
    [] ? irq_exit+0x4e/0xd3
    [] ? smp_apic_timer_interrupt+0x3b/0x46
    [] ? apic_timer_interrupt+0x6a/0x70

    mld_newpack() skb allocations are usually requested with dev->mtu
    in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
    we have changed the limit in order to be less likely to fail.

    However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
    macros, which determine if we may end up doing an skb_put() for
    adding another record. To avoid possible fragmentation, we check
    the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
    assumption as the actual max allocation size can be much smaller.

    The IGMP case doesn't have this issue as commit 57e1ab6eaddc
    ("igmp: refine skb allocations") stores the allocation size in
    the cb[].

    Set a reserved_tailroom to make it fit into the MTU and use
    skb_availroom() helper instead. This also allows to get rid of
    igmp_skb_size().

    Reported-by: Wei Liu
    Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
    Signed-off-by: Daniel Borkmann
    Cc: Eric Dumazet
    Cc: Hannes Frederic Sowa
    Cc: David L Stevens
    Acked-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Pravin B Shelar says:

    ====================
    Open vSwitch

    Following fixes are accumulated in ovs-repo.
    Three of them are related to protocol processing, one is
    related to memory leak in case of error and one is to
    fix race.
    Patch "Validate IPv6 flow key and mask values" has conflicts
    with net-next, Let me know if you want me to send the patch
    for net-next.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Solves possible lockup issues that can be seen from firmware DCB agents calling
    into the DCB app api.

    DCB firmware event queues can be tied in with NAPI so that dcb events are
    generated in softIRQ context. This can results in calls to dcb_*app()
    functions which try to take the dcb_lock.

    If the the event triggers while we also have the dcb_lock because lldpad or
    some other agent happened to be issuing a get/set command we could see a cpu
    lockup.

    This code was not originally written with firmware agents in mind, hence
    grabbing dcb_lock from softIRQ context was not considered.

    Signed-off-by: Anish Bhatt
    Signed-off-by: David S. Miller

    Anish Bhatt
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS fixes for net

    The following patchset contains Netfilter updates for your net tree,
    they are:

    1) Fix missing initialization of the range structure (allocated in the
    stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.

    2) Make sure the data we receive from userspace contains the req_version
    structure, otherwise return an error incomplete on truncated input.
    From Dan Carpenter.

    3) Fix handling og skb->sk which may cause incorrect handling
    of connections from a local process. Via Simon Horman, patch from
    Calvin Owens.

    4) Fix wrong netns in nft_compat when setting target and match params
    structure.

    5) Relax chain type validation in nft_compat that was recently included,
    this broke the matches that need to be run from the route chain type.
    Now iptables-test.py automated regression tests report success again
    and we avoid the only possible problematic case, which is the use of
    nat targets out of nat chain type.

    6) Use match->table to validate the tablename, instead of the match->name.
    Again patch for nft_compat.

    7) Restore the synchronous release of objects from the commit and abort
    path in nf_tables. This is causing two major problems: splats when using
    nft_compat, given that matches and targets may sleep and call_rcu is
    invoked from softirq context. Moreover Patrick reported possible event
    notification reordering when rules refer to anonymous sets.

    8) Fix race condition in between packets that are being confirmed by
    conntrack and the ctnetlink flush operation. This happens since the
    removal of the central spinlock. Thanks to Jesper D. Brouer to looking
    into this.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Trying to add an unreachable route incorrectly returns -ESRCH if
    if custom FIB rules are present:

    [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
    RTNETLINK answers: Network is unreachable
    [root@localhost ~]# ip rule add to 55.66.77.88 table 200
    [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
    RTNETLINK answers: No such process
    [root@localhost ~]#

    Commit 83886b6b636173b206f475929e58fac75c6f2446 ("[NET]: Change "not found"
    return value for rule lookup") changed fib_rules_lookup()
    to use -ESRCH as a "not found" code internally, but for user space it
    should be translated into -ENETUNREACH. Handle the translation centrally in
    ipv4-specific fib_lookup(), leaving the DECnet case alone.

    On a related note, commit b7a71b51ee37d919e4098cd961d59a883fd272d8
    ("ipv4: removed redundant conditional") removed a similar translation from
    ip_route_input_slow() prematurely AIUI.

    Fixes: b7a71b51ee37 ("ipv4: removed redundant conditional")
    Signed-off-by: Panu Matilainen
    Signed-off-by: David S. Miller

    Panu Matilainen
     

16 Nov, 2014

1 commit

  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    - stable patches to fix NFSv4.x delegation reclaim error paths
    - fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2
    features
    - fix a use-after-free problem with pNFS block layouts
    - fix a memory leak in the pNFS files O_DIRECT code
    - replace an intrusive and Oops-prone performance fix in the NFSv4
    atomic open code with a safer one-line version and revert the two
    original patches"

    * tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
    NFS: Don't try to reclaim delegation open state if recovery failed
    NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked
    NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
    NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE
    NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
    NFS: SEEK is an NFS v4.2 feature
    nfs: Fix use of uninitialized variable in nfs_getattr()
    nfs: Remove bogus assignment
    nfs: remove spurious WARN_ON_ONCE in write path
    pnfs/blocklayout: serialize GETDEVICEINFO calls
    nfs: fix pnfs direct write memory leak
    Revert "NFS: nfs4_do_open should add negative results to the dcache."
    Revert "NFS: remove BUG possibility in nfs4_open_and_get_state"
    NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT

    Linus Torvalds
     

15 Nov, 2014

7 commits

  • Reject flow label key and mask values with invalid bits set.
    Introduced by commit 3fdbd1ce11e5 ("openvswitch: add ipv6 'set'
    action").

    Signed-off-by: Jarno Rajahalme
    Acked-by: Jesse Gross
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • dp read operations depends on ovs_dp_cmd_fill_info(). This API
    needs to looup vport to find dp name, but vport lookup can
    fail. Therefore to keep vport reference alive we need to
    take ovs lock.

    Introduced by commit 6093ae9abac1 ("openvswitch: Minimize
    dp and vport critical sections").

    Signed-off-by: Pravin B Shelar
    Acked-by: Andy Zhou

    Pravin B Shelar
     
  • match_validate() enforce that a mask matching on NDP attributes has also an
    exact match on ICMPv6 type.
    The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of
    'struct sw_flow_key', which is 16-bit wide.
    Therefore, an exact match on ICMPv6 type should only check the first 8 bits.

    This commit fixes a bug that prevented flows with an exact match on NDP field
    from being installed
    Introduced by commit 03f0d916aa03 ("openvswitch: Mega flow implementation").

    Signed-off-by: Daniele Di Proietto
    Signed-off-by: Pravin B Shelar

    Daniele Di Proietto
     
  • The checksum of ICMPv6 packets uses the IP pseudoheader as part of
    the calculation, unlike ICMP in IPv4. This was not implemented,
    which means that modifying the IP addresses of an ICMPv6 packet
    would cause the checksum to no longer be correct as the psuedoheader
    did not match.
    Introduced by commit 3fdbd1ce11e5 ("openvswitch: add ipv6 'set' action").

    Reported-by: Neal Shrader
    Signed-off-by: Jesse Gross
    Signed-off-by: Pravin B Shelar

    Jesse Gross
     
  • Need to free memory in case of sample action error.

    Introduced by commit 651887b0c22cffcfce7eb9c ("openvswitch: Sample
    action without side effects").

    Signed-off-by: Pravin B Shelar

    Pravin B Shelar
     
  • John W. Linville says:

    ====================
    pull request: wireless 2014-11-13

    Please pull this set of a few more wireless fixes intended for the
    3.18 stream...

    For the mac80211 bits, Johannes says:

    "This has just one fix, for an issue with the CCMP decryption
    that can cause a kernel crash. I'm not sure it's remotely
    exploitable, but it's an important fix nonetheless."

    For the iwlwifi bits, Emmanuel says:

    "Two fixes here - we weren't updating mac80211 if a scan
    was cut short by RFKILL which confused cfg80211. As a
    result, the latter wouldn't allow to run another scan.
    Liad fixes a small bug in the firmware dump."

    On top of that...

    Arend van Spriel corrects a channel width conversion that caused a
    WARNING in brcmfmac.

    Hauke Mehrtens avoids a NULL pointer dereference in b43.

    Larry Finger hits a trio of rtlwifi bugs left over from recent
    backporting from the Realtek vendor driver.

    Miaoqing Pan fixes a clocking problem in ath9k that could affect
    packet timestamps and such.

    Stanislaw Gruszka addresses an payload alignment issue that has been
    plaguing rt2x00.

    Please let me know if there are problems!
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • After removal of the central spinlock nf_conntrack_lock, in
    commit 93bb0ceb75be2 ("netfilter: conntrack: remove central
    spinlock nf_conntrack_lock"), it is possible to race against
    get_next_corpse().

    The race is against the get_next_corpse() cleanup on
    the "unconfirmed" list (a per-cpu list with seperate locking),
    which set the DYING bit.

    Fix this race, in __nf_conntrack_confirm(), by removing the CT
    from unconfirmed list before checking the DYING bit. In case
    race occured, re-add the CT to the dying list.

    While at this, fix coding style of the comment that has been
    updated.

    Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
    Reported-by: bill bonaparte
    Signed-off-by: bill bonaparte
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Pablo Neira Ayuso

    bill bonaparte
     

14 Nov, 2014

6 commits

  • Pull networking fixes from David Miller:

    1) sunhme driver lacks DMA mapping error checks, based upon a report by
    Meelis Roos.

    2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee.

    3) DMA memory allocation sizes are wrong in systemport ethernet driver,
    fix from Florian Fainelli.

    4) Fix use after free in mac80211 defragmentation code, from Johannes
    Berg.

    5) Some networking uapi headers missing from Kbuild file, from Stephen
    Hemminger.

    6) TUN driver gets csum_start offset wrong when VLAN accel is enabled,
    and macvtap has a similar bug, from Herbert Xu.

    7) Adjust several tunneling drivers to set dev->iflink after registry,
    because registry sets that to -1 overwriting whatever we did. From
    Steffen Klassert.

    8) Geneve forgets to set inner tunneling type, causing GSO segmentation
    to fail on some NICs. From Jesse Gross.

    9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and
    Giuseppe CAVALLARO.

    10) Fix spurious timeouts with NewReno on low traffic connections, from
    Marcelo Leitner.

    11) Fix descriptor updates in enic driver, from Govindarajulu
    Varadarajan.

    12) PPP calls bpf_prog_create() with locks held, which isn't kosher.
    Fix from Takashi Iwai.

    13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel
    Borkmann.

    14) psock_fanout selftest accesses past the end of the mmap ring, fix
    from Shuah Khan.

    15) Fix PTP timestamping for VLAN packets, from Richard Cochran.

    16) netlink_unbind() calls in netlink pass wrong initial argument, from
    Hiroaki SHIMODA.

    17) vxlan socket reuse accidently reuses a socket when the address
    family is different, so we have to explicitly check this, from
    Marcelo Lietner.

    18) Fix missing include in nft_reject_bridge.c breaking the build on ppc
    and other architectures, from Guenter Roeck.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
    vxlan: Do not reuse sockets for a different address family
    smsc911x: power-up phydev before doing a software reset.
    lib: rhashtable - Remove weird non-ASCII characters from comments
    net/smsc911x: Fix delays in the PHY enable/disable routines
    net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode
    netlink: Properly unbind in error conditions.
    net: ptp: fix time stamp matching logic for VLAN packets.
    cxgb4 : dcb open-lldp interop fixes
    selftests/net: psock_fanout seg faults in sock_fanout_read_ring()
    net: bcmgenet: apply MII configuration in bcmgenet_open()
    net: bcmgenet: connect and disconnect from the PHY state machine
    net: qualcomm: Fix dependency
    ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx
    net: phy: Correctly handle MII ioctl which changes autonegotiation.
    ipv6: fix IPV6_PKTINFO with v4 mapped
    net: sctp: fix memory leak in auth key management
    net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
    net: ppp: Don't call bpf_prog_create() in ppp_lock
    net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set
    cxgb4 : Fix bug in DCB app deletion
    ...

    Linus Torvalds
     
  • No reason to use BUG_ON for osd request list assertions.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • kick_requests() can put linger requests on the notarget list. This
    means we need to clear the much-overloaded req->r_req_lru_item in
    __unregister_linger_request() as well, or we get an assertion failure
    in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

    AFAICT the assumption was that registered linger requests cannot be on
    any of req->r_req_lru_item lists, but that's clearly not the case.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • Requests have to be unlinked from both osd->o_requests (normal
    requests) and osd->o_linger_requests (linger requests) lists when
    clearing req->r_osd. Otherwise __unregister_linger_request() gets
    confused and we trip over a !list_empty(&osd->o_linger_requests)
    assert in __remove_osd().

    MON=1 OSD=1:

    # cat remove-osd.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    sleep 3
    rbd map dne/dne # obtain a new osdmap as a side effect
    rbd unmap $DEV & # will block
    sleep 3
    ceph osd in 0

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
    tickets will have their buffers vmalloc'ed, which leads to the
    following crash in crypto:

    [ 28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
    [ 28.686032] IP: [] scatterwalk_pagedone+0x22/0x80
    [ 28.686032] PGD 0
    [ 28.688088] Oops: 0000 [#1] PREEMPT SMP
    [ 28.688088] Modules linked in:
    [ 28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
    [ 28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    [ 28.688088] Workqueue: ceph-msgr con_work
    [ 28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
    [ 28.688088] RIP: 0010:[] [] scatterwalk_pagedone+0x22/0x80
    [ 28.688088] RSP: 0018:ffff8800d903f688 EFLAGS: 00010286
    [ 28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
    [ 28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
    [ 28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
    [ 28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
    [ 28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
    [ 28.688088] FS: 00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
    [ 28.688088] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
    [ 28.688088] Stack:
    [ 28.688088] ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
    [ 28.688088] ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
    [ 28.688088] ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
    [ 28.688088] Call Trace:
    [ 28.688088] [] scatterwalk_done+0x38/0x40
    [ 28.688088] [] scatterwalk_done+0x38/0x40
    [ 28.688088] [] blkcipher_walk_done+0x182/0x220
    [ 28.688088] [] crypto_cbc_encrypt+0x15f/0x180
    [ 28.688088] [] ? crypto_aes_set_key+0x30/0x30
    [ 28.688088] [] ceph_aes_encrypt2+0x29c/0x2e0
    [ 28.688088] [] ceph_encrypt2+0x93/0xb0
    [ 28.688088] [] ceph_x_encrypt+0x4a/0x60
    [ 28.688088] [] ? ceph_buffer_new+0x5d/0xf0
    [ 28.688088] [] ceph_x_build_authorizer.isra.6+0x297/0x360
    [ 28.688088] [] ? kmem_cache_alloc_trace+0x11b/0x1c0
    [ 28.688088] [] ? ceph_auth_create_authorizer+0x36/0x80
    [ 28.688088] [] ceph_x_create_authorizer+0x63/0xd0
    [ 28.688088] [] ceph_auth_create_authorizer+0x54/0x80
    [ 28.688088] [] get_authorizer+0x80/0xd0
    [ 28.688088] [] prepare_write_connect+0x18b/0x2b0
    [ 28.688088] [] try_read+0x1e59/0x1f10

    This is because we set up crypto scatterlists as if all buffers were
    kmalloc'ed. Fix it.

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Bruce reported that he was seeing the following BUG pop:

    BUG: sleeping function called from invalid context at mm/slab.c:2846
    in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs
    2 locks held by mount.nfs/4539:
    #0: (nfs_clid_init_mutex){+.+.+.}, at: [] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4]
    #1: (rcu_read_lock){......}, at: [] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    Preemption disabled at:[] printk+0x4d/0x4f

    CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e9 #3393
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001
    0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0
    0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00
    Call Trace:
    [] dump_stack+0x4f/0x7c
    [] __might_sleep+0x114/0x180
    [] __kmalloc+0x1a3/0x280
    [] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss]
    [] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    [] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc]
    [] nfs4_proc_setclientid+0x199/0x380 [nfsv4]
    [] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4]
    [] nfs40_discover_server_trunking+0xda/0x150 [nfsv4]
    [] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4]
    [] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4]
    [] nfs4_init_client+0x104/0x2f0 [nfsv4]
    [] nfs_get_client+0x314/0x3f0 [nfs]
    [] ? nfs_get_client+0xe0/0x3f0 [nfs]
    [] nfs4_set_client+0x8a/0x110 [nfsv4]
    [] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
    [] nfs4_create_server+0x12f/0x390 [nfsv4]
    [] nfs4_remote_mount+0x32/0x60 [nfsv4]
    [] mount_fs+0x39/0x1b0
    [] ? __alloc_percpu+0x15/0x20
    [] vfs_kern_mount+0x6b/0x150
    [] nfs_do_root_mount+0x86/0xc0 [nfsv4]
    [] nfs4_try_mount+0x44/0xc0 [nfsv4]
    [] ? get_nfs_version+0x27/0x90 [nfs]
    [] nfs_fs_mount+0x47d/0xd60 [nfs]
    [] ? mutex_unlock+0xe/0x10
    [] ? nfs_remount+0x430/0x430 [nfs]
    [] ? nfs_clone_super+0x140/0x140 [nfs]
    [] mount_fs+0x39/0x1b0
    [] ? __alloc_percpu+0x15/0x20
    [] vfs_kern_mount+0x6b/0x150
    [] do_mount+0x210/0xbe0
    [] ? copy_mount_options+0x3a/0x160
    [] SyS_mount+0x6f/0xb0
    [] system_call_fastpath+0x12/0x17

    Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
    the rcu_read_lock before doing the allocation and then reacquiring it
    and redoing the dereference before doing the copy. If we find that the
    string has somehow grown in the meantime, we'll reallocate and try again.

    Cc: # v3.17+
    Reported-by: "J. Bruce Fields"
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

13 Nov, 2014

1 commit

  • Even if netlink_kernel_cfg::unbind is implemented the unbind() method is
    not called, because cfg->unbind is omitted in __netlink_kernel_create().
    And fix wrong argument of test_bit() and off by one problem.

    At this point, no unbind() method is implemented, so there is no real
    issue.

    Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.")
    Signed-off-by: Hiroaki SHIMODA
    Cc: Richard Guy Briggs
    Acked-by: Richard Guy Briggs
    Signed-off-by: David S. Miller

    Hiroaki SHIMODA
     

12 Nov, 2014

8 commits

  • The existing xtables matches and targets, when used from nft_compat, may
    sleep from the destroy path, ie. when removing rules. Since the objects
    are released via call_rcu from softirq context, this results in lockdep
    splats and possible lockups that may be hard to reproduce.

    Patrick also indicated that delayed object release via call_rcu can
    cause us problems in the ordering of event notifications when anonymous
    sets are in place.

    So, this patch restores the synchronous object release from the commit
    and abort paths. This includes a call to synchronize_rcu() to make sure
    that no packets are walking on the objects that are going to be
    released. This is slowier though, but it's simple and it resolves the
    aforementioned problems.

    This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
    object release via rcu") that was introduced in 3.16 to speed up
    interaction with userspace.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Instead of the match->name, which is of course not relevant.

    Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Check for nat chain dependency only, which is the one that can
    actually crash the kernel. Don't care if mangle, filter and security
    specific match and targets are used out of their scope, they are
    harmless.

    This restores iptables-compat with mangle specific match/target when
    used out of the OUTPUT chain, that are actually emulated through filter
    chains, which broke when performing strict validation.

    Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Instead of init_net when using xtables over nftables compat.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new
    skb, either unconditionally setting ->sk to NULL or allowing
    the uninitialized ->sk from a newly allocated skb to leak through
    to the caller.

    This patch properly copies ->sk and increments its reference count.

    Signed-off-by: Calvin Owens
    Acked-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Calvin Owens
     
  • Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is
    a module.

    Signed-off-by: Eric Dumazet
    Fixes: c8e6ad0829a7 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg")
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • A very minimal and simple user space application allocating an SCTP
    socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
    the socket again will leak the memory containing the authentication
    key from user space:

    unreferenced object 0xffff8800837047c0 (size 16):
    comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
    hex dump (first 16 bytes):
    01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x4e/0xb0
    [] __kmalloc+0xe8/0x270
    [] sctp_auth_create_key+0x23/0x50 [sctp]
    [] sctp_auth_set_key+0xa1/0x140 [sctp]
    [] sctp_setsockopt+0xd03/0x1180 [sctp]
    [] sock_common_setsockopt+0x14/0x20
    [] SyS_setsockopt+0x71/0xd0
    [] system_call_fastpath+0x12/0x17
    [] 0xffffffffffffffff

    This is bad because of two things, we can bring down a machine from
    user space when auth_enable=1, but also we would leave security sensitive
    keying material in memory without clearing it after use. The issue is
    that sctp_auth_create_key() already sets the refcount to 1, but after
    allocation sctp_auth_set_key() does an additional refcount on it, and
    thus leaving it around when we free the socket.

    Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
    Signed-off-by: Daniel Borkmann
    Cc: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
    in the form of:

    ------------ INIT[PARAM: SET_PRIMARY_IP] ------------>

    While the INIT chunk parameter verification dissects through many things
    in order to detect malformed input, it misses to actually check parameters
    inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
    IP address' parameter in ASCONF, which has as a subparameter an address
    parameter.

    So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
    or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
    and thus sctp_get_af_specific() returns NULL, too, which we then happily
    dereference unconditionally through af->from_addr_param().

    The trace for the log:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
    IP: [] sctp_process_init+0x492/0x990 [sctp]
    PGD 0
    Oops: 0000 [#1] SMP
    [...]
    Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
    RIP: 0010:[] [] sctp_process_init+0x492/0x990 [sctp]
    [...]
    Call Trace:

    [] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
    [] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
    [] sctp_do_sm+0x71/0x1210 [sctp]
    [] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
    [] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
    [] sctp_inq_push+0x56/0x80 [sctp]
    [] sctp_rcv+0x982/0xa10 [sctp]
    [] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
    [] ? nf_iterate+0x69/0xb0
    [] ? ip_local_deliver_finish+0x0/0x2d0
    [] ? nf_hook_slow+0x76/0x120
    [] ? ip_local_deliver_finish+0x0/0x2d0
    [...]

    A minimal way to address this is to check for NULL as we do on all
    other such occasions where we know sctp_get_af_specific() could
    possibly return with NULL.

    Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
    Signed-off-by: Daniel Borkmann
    Cc: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann