04 Nov, 2014

1 commit

  • Pull ceph fixes from Sage Weil:
    "There is a GFP flag fix from Mike Christie, an error code fix from
    Jan, and fixes for two unnecessary allocations (kmalloc and workqueue)
    from Ilya. All are well tested.

    Ilya has one other fix on the way but it didn't get tested in time"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: eliminate unnecessary allocation in process_one_ticket()
    rbd: Fix error recovery in rbd_obj_read_sync()
    libceph: use memalloc flags for net IO
    rbd: use a single workqueue for all devices

    Linus Torvalds
     

03 Nov, 2014

1 commit

  • The sk_prot is irda's own set of protocol handlers, so irda should
    statically know what that function is anyway, without using an indirect
    pointer. And as it happens, we know *exactly* what that pointer is
    statically: it's NULL, because irda doesn't define a disconnect
    operation.

    So calling that function is doubly wrong, and will just cause an oops.

    Reported-by: Martin Lang
    Cc: Samuel Ortiz
    Cc: David Miller
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Nov, 2014

5 commits

  • Commit c27a3e4d667f ("libceph: do not hard code max auth ticket len")
    while fixing a buffer overlow tried to keep the same as much of the
    surrounding code as possible and introduced an unnecessary kmalloc() in
    the unencrypted ticket path. It is likely to fail on huge tickets, so
    get rid of it.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • If a driver supports reading EEPROM but no EEPROM is installed in the system,
    the driver's get_eeprom_len function returns 0. ethtool will subsequently
    try to read that zero-length EEPROM anyway. If the driver does not support
    EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver
    does support EEPROM access but no EEPROM is installed, the operation will
    return -EINVAL. Return -EOPNOTSUPP in both cases for consistency.

    Signed-off-by: Guenter Roeck
    Tested-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Kconfig already allows mpls to be built as module. Following patch
    fixes Makefile to do same.

    CC: Simon Horman
    Signed-off-by: Pravin B Shelar
    Acked-by: Simon Horman
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • mpls gso handler needs to pull skb after segmenting skb.

    CC: Simon Horman
    Signed-off-by: Pravin B Shelar
    Acked-by: Simon Horman
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Pablo Neira Ayuso says:

    ====================
    netfilter/ipvs fixes for net

    The following patchset contains fixes for netfilter/ipvs. This round of
    fixes is larger than usual at this stage, specifically because of the
    nf_tables bridge reject fixes that I would like to see in 3.18. The
    patches are:

    1) Fix a null-pointer dereference that may occur when logging
    errors. This problem was introduced by 4a4739d56b0 ("ipvs: Pull
    out crosses_local_route_boundary logic") in v3.17-rc5.

    2) Update hook mask in nft_reject_bridge so we can also filter out
    packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow
    to filter from prerouting and postrouting"), which needs this chunk
    to work.

    3) Two patches to refactor common code to forge the IPv4 and IPv6
    reject packets from the bridge. These are required by the nf_tables
    reject bridge fix.

    4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
    packets from the bridge. The idea is to forge the reject packets and
    inject them to the original port via br_deliver() which is now
    exported for that purpose.

    5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
    the original skbuff may cloned after prerouting when the bridge stack
    needs to flood it to several bridge ports, it is too late to reject
    the traffic.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

31 Oct, 2014

9 commits

  • Restrict the reject expression to the prerouting and input bridge
    hooks. If we allow this to be used from forward or any other later
    bridge hook, if the frame is flooded to several ports, we'll end up
    sending several reject packets, one per cloned packet.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • If the packet is received via the bridge stack, this cannot reject
    packets from the IP stack.

    This adds functions to build the reject packet and send it from the
    bridge stack. Comments and assumptions on this patch:

    1) Validate the IPv4 and IPv6 headers before further processing,
    given that the packet comes from the bridge stack, we cannot assume
    they are clean. Truncated packets are dropped, we follow similar
    approach in the existing iptables match/target extensions that need
    to inspect layer 4 headers that is not available. This also includes
    packets that are directed to multicast and broadcast ethernet
    addresses.

    2) br_deliver() is exported to inject the reject packet via
    bridge localout -> postrouting. So the approach is similar to what
    we already do in the iptables reject target. The reject packet is
    sent to the bridge port from which we have received the original
    packet.

    3) The reject packet is forged based on the original packet. The TTL
    is set based on sysctl_ip_default_ttl for IPv4 and per-net
    ipv6.devconf_all hoplimit for IPv6.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • That can be reused by the reject bridge expression to build the reject
    packet. The new functions are:

    * nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
    * nf_reject_ip6hdr_put(): to build the IPv6 header.
    * nf_reject_ip6_tcphdr_put(): to build the TCP header.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • That can be reused by the reject bridge expression to build the reject
    packet. The new functions are:

    * nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
    * nf_reject_iphdr_put(): to build the IPv4 header.
    * nf_reject_ip_tcphdr_put(): to build the TCP header.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Fixes: 36d2af5 ("netfilter: nf_tables: allow to filter from prerouting and postrouting")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • UFO is now disabled on all drivers that work with virtio net headers,
    but userland may try to send UFO/IPv6 packets anyway. Instead of
    sending with ID=0, we should select identifiers on their behalf (as we
    used to).

    Signed-off-by: Ben Hutchings
    Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Some drivers are unable to perform TX completions in a bound time.
    They instead call skb_orphan()

    Problem is skb_fclone_busy() has to detect this case, otherwise
    we block TCP retransmits and can freeze unlucky tcp sessions on
    mostly idle hosts.

    Signed-off-by: Eric Dumazet
    Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues")
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Currently, skb_inner_network_header is used but this does not account
    for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
    handles TEB and also should work with IP encapsulation in which case
    inner mac and inner network headers are the same.

    Tested: Ran TCP_STREAM over GRE, worked as expected.

    Signed-off-by: Tom Herbert
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • If we cache them, the kernel will reuse them, independently of
    whether forwarding is enabled or not. Which means that if forwarding is
    disabled on the input interface where the first routing request comes
    from, then that unreachable result will be cached and reused for
    other interfaces, even if forwarding is enabled on them. The opposite
    is also true.

    This can be verified with two interfaces A and B and an output interface
    C, where B has forwarding enabled, but not A and trying
    ip route get $dst iif A from $src && ip route get $dst iif B from $src

    Signed-off-by: Nicolas Cavallari
    Reviewed-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Nicolas Cavallari
     

30 Oct, 2014

5 commits

  • This patch has ceph's lib code use the memalloc flags.

    If the VM layer needs to write data out to free up memory to handle new
    allocation requests, the block layer must be able to make forward progress.
    To handle that requirement we use structs like mempools to reserve memory for
    objects like bios and requests.

    The problem is when we send/receive block layer requests over the network
    layer, net skb allocations can fail and the system can lock up.
    To solve this, the memalloc related flags were added. NBD, iSCSI
    and NFS uses these flags to tell the network/vm layer that it should
    use memory reserves to fullfill allcation requests for structs like
    skbs.

    I am running ceph in a bunch of VMs in my laptop, so this patch was
    not tested very harshly.

    Signed-off-by: Mike Christie
    Reviewed-by: Ilya Dryomov

    Mike Christie
     
  • The WARN_ON in inet_evict_bucket can be triggered by a valid case:
    inet_frag_kill and inet_evict_bucket can be running in parallel on the
    same queue which means that there has been at least one more ref added
    by a previous inet_frag_find call, but inet_frag_kill can delete the
    timer before inet_evict_bucket which will cause the WARN_ON() there to
    trigger since we'll have refcnt!=1. Now, this case is valid because the
    queue is being "killed" for some reason (removed from the chain list and
    its timer deleted) so it will get destroyed in the end by one of the
    inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.

    CC: Florian Westphal
    CC: Eric Dumazet
    CC: Patrick McLean

    Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
    Reported-by: Patrick McLean
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • When the evictor is running it adds some chosen frags to a local list to
    be evicted once the chain lock has been released but at the same time
    the *frag_queue can be running for some of the same queues and it
    may call inet_frag_kill which will wait on the chain lock and
    will then delete the queue from the wrong list since it was added in the
    eviction one. The fix is simple - check if the queue has the evict flag
    set under the chain lock before deleting it, this is safe because the
    evict flag is set only under that lock and having the flag set also means
    that the queue has been detached from the chain list, so no need to delete
    it again.
    An important note to make is that we're safe w.r.t refcnt because
    inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
    where only one of the two can succeed (or if the timer is executing -
    none of them), the cases are:
    1. inet_frag_kill succeeds in del_timer
    - then the timer ref is removed, but inet_evict_bucket will not add
    this queue to its expire list but will restart eviction in that chain
    2. inet_evict_bucket succeeds in del_timer
    - then the timer ref is kept until the evictor "expires" the queue, but
    inet_frag_kill will remove the initial ref and will set
    INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
    its ref.
    In the end all of the queue users will do an inet_frag_put and the one
    that reaches 0 will free it. The refcount balance should be okay.

    CC: Florian Westphal
    CC: Eric Dumazet
    CC: Patrick McLean

    Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
    Suggested-by: Eric Dumazet
    Reported-by: Patrick McLean
    Tested-by: Patrick McLean
    Signed-off-by: Nikolay Aleksandrov
    Reviewed-by: Florian Westphal
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • NetworkManager might want to know that it changed when the router advertisement
    arrives.

    Signed-off-by: Lubomir Rintel
    Cc: Hannes Frederic Sowa
    Cc: Daniel Borkmann
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Lubomir Rintel
     
  • Cc: Vijay Subramanian
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Acked-by: Eric Dumazet

    WANG Cong
     

29 Oct, 2014

2 commits

  • John W. Linville says:

    ====================
    pull request: wireless 2014-10-28

    Please pull this batch of fixes intended for the 3.18 stream!

    For the mac80211 bits, Johannes says:

    "Here are a few fixes for the wireless stack: one fixes the
    RTS rate, one for a debugfs file, one to return the correct
    channel to userspace, a sanity check for a userspace value
    and the remaining two are just documentation fixes."

    For the iwlwifi bits, Emmanuel says:

    "I revert here a patch that caused interoperability issues.
    dvm gets a fix for a bug that was reported by many users.
    Two minor fixes for BT Coex and platform power fix that helps
    reducing latency when the PCIe link goes to low power states."

    In addition...

    Felix Fietkau adds a couple of ath code fixes related to regulatory
    rule enforcement.

    Hauke Mehrtens fixes a build break with bcma when CONFIG_OF_ADDRESS
    is not set.

    Karsten Wiese provides a trio of minor fixes for rtl8192cu.

    Kees Cook prevents a potential information leak in rtlwifi.

    Larry Finger also brings a trio of minor fixes for rtlwifi.

    Rafał Miłecki adds a device ID to the bcma bus driver.

    Rickard Strandqvist offers some strn* -> strl* changes in brcmfmac
    to eliminate non-terminated string issues.

    Sujith Manoharan avoids some ath9k stalls by enabling HW queue control
    only for MCC.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • If there is a mismatch between enabled tagging protocols and the
    protocol the switch supports, error out, rather than continue with a
    situation which is unlikely to work.

    Signed-off-by: Andrew Lunn
    cc: alexander.h.duyck@intel.com
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Andrew Lunn
     

28 Oct, 2014

5 commits

  • Use daddr instead of reaching into dest.

    Reported-by: Dan Carpenter
    Signed-off-by: Alex Gartrell
    Signed-off-by: Simon Horman

    Alex Gartrell
     
  • introduce two configs:
    - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
    depend on
    - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use

    that solves several problems:
    - tracing and others that wish to use eBPF don't need to depend on NET.
    They can use BPF_SYSCALL to allow loading from userspace or select BPF
    to use it directly from kernel in NET-less configs.
    - in 3.18 programs cannot be attached to events yet, so don't force it on
    - when the rest of eBPF infra is there in 3.19+, it's still useful to
    switch it off to minimize kernel size

    bloat-o-meter on x64 shows:
    add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)

    tested with many different config combinations. Hopefully didn't miss anything.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for your net tree,
    they are:

    1) Allow to recycle a TCP port in conntrack when the change role from
    server to client, from Marcelo Leitner.

    2) Fix possible off by one access in ip_set_nfnl_get_byindex(), patch
    from Dan Carpenter.

    3) alloc_percpu returns NULL on error, no need for IS_ERR() in nf_tables
    chain statistic updates. From Sabrina Dubroca.

    4) Don't compile ip options in bridge netfilter, this mangles the packet
    and bridge should not alter layer >= 3 headers when forwarding packets.
    Patch from Herbert Xu and tested by Florian Westphal.

    5) Account the final NLMSG_DONE message when calculating the size of the
    nflog netlink batches. Patch from Florian Westphal.

    6) Fix a possible netlink attribute length overflow with large packets.
    Again from Florian Westphal.

    7) Release the skbuff if nfnetlink_log fails to put the final
    NLMSG_DONE message. This fixes a leak on error. This shouldn't ever
    happen though, otherwise this means we miscalculate the netlink batch
    size, so spot a warning if this ever happens so we can track down the
    problem. This patch from Houcheng Lin.

    8) Look at the right list when recycling targets in the nft_compat,
    patch from Arturo Borrero.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The code looks for an already loaded target, and the correct list to search
    is nft_target_list, not nft_match_list.

    Signed-off-by: Arturo Borrero Gonzalez
    Signed-off-by: Pablo Neira Ayuso

    Arturo Borrero
     
  • …ernel/git/jberg/mac80211

    Johannes Berg <johannes@sipsolutions.net> says:

    "Here are a few fixes for the wireless stack: one fixes the
    RTS rate, one for a debugfs file, one to return the correct
    channel to userspace, a sanity check for a userspace value
    and the remaining two are just documentation fixes."

    Signed-off-by: John W. Linville <linville@tuxdriver.com>

    John W. Linville
     

27 Oct, 2014

1 commit


26 Oct, 2014

1 commit

  • percpu tcp_md5sig_pool contains memory blobs that ultimately
    go through sg_set_buf().

    -> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));

    This requires that whole area is in a physically contiguous portion
    of memory. And that @buf is not backed by vmalloc().

    Given that alloc_percpu() can use vmalloc() areas, this does not
    fit the requirements.

    Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool
    is small anyway, there is no gain to dynamically allocate it.

    Signed-off-by: Eric Dumazet
    Fixes: 765cf9976e93 ("tcp: md5: remove one indirection level in tcp_md5sig_pool")
    Reported-by: Crestez Dan Leonard
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Oct, 2014

4 commits

  • The kernel should reserve enough room in the skb so that the DONE
    message can always be appended. However, in case of e.g. new attribute
    erronously not being size-accounted for, __nfulnl_send() will still
    try to put next nlmsg into this full skbuf, causing the skb to be stuck
    forever and blocking delivery of further messages.

    Fix issue by releasing skb immediately after nlmsg_put error and
    WARN() so we can track down the cause of such size mismatch.

    [ fw@strlen.de: add tailroom/len info to WARN ]

    Signed-off-by: Houcheng Lin
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Houcheng Lin
     
  • don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
    The nla length includes the size of the nla struct, so anything larger
    results in u16 integer overflow.

    This patch is similar to
    9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage).

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • We currently neither account for the nlattr size, nor do we consider
    the size of the trailing NLMSG_DONE when allocating nlmsg skb.

    This can result in nflog to stop working, as __nfulnl_send() re-tries
    sending forever if it failed to append NLMSG_DONE (which will never
    work if buffer is not large enough).

    Reported-by: Houcheng Lin
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Commit 462fb2af9788a82a534f8184abfde31574e1cfa0

    bridge : Sanitize skb before it enters the IP stack

    broke when IP options are actually used because it mangles the
    skb as if it entered the IP stack which is wrong because the
    bridge is supposed to operate below the IP stack.

    Since nobody has actually requested for parsing of IP options
    this patch fixes it by simply reverting to the previous approach
    of ignoring all IP options, i.e., zeroing the IPCB.

    If and when somebody who uses IP options and actually needs them
    to be parsed by the bridge complains then we can revisit this.

    Reported-by: David Newall
    Signed-off-by: Herbert Xu
    Tested-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Herbert Xu
     

23 Oct, 2014

3 commits

  • The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
    introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
    and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
    to set skb->hash which is used to spread flows across multiple TXQs.

    But, the above routines are invoked before the source port of the connection
    is created. Because of this all outgoing connections that just differ in the
    source port get hashed into the same TXQ.

    This patch fixes this problem for IPv4/6 by invoking the the above routines
    after the source port is available for the socket.

    Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit")

    Signed-off-by: Sathya Perla
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Sathya Perla
     
  • pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
    oboslete, so recompute the nd and exthdr

    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     
  • The crafted header start address is from a driver supplied buffer, which
    one can reasonably expect to be aligned on a 4-bytes boundary.
    However ATM the TSO helper API is only used by ethernet drivers and
    the tcp header will then be aligned to a 2-bytes only boundary from the
    header start address.

    Signed-off-by: Karl Beldan
    Cc: Ezequiel Garcia
    Signed-off-by: David S. Miller

    Karl Beldan
     

22 Oct, 2014

3 commits

  • alloc_percpu returns NULL on failure, not a negative error code.

    Fixes: ff3cd7b3c922 ("netfilter: nf_tables: refactor chain statistic routines")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: Pablo Neira Ayuso

    Sabrina Dubroca
     
  • The ->ip_set_list[] array is initialized in ip_set_net_init() and it
    has ->ip_set_max elements so this check should be >= instead of >
    otherwise we are off by one.

    Signed-off-by: Dan Carpenter
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Dan Carpenter
     
  • When a port that was used to listen for inbound connections gets closed
    and reused for outgoing connections (like rsh ends up doing for stderr
    flow), current we may reject the SYN/ACK packet for the new connection
    because tcp_conntracks states forbirds a port to become a client while
    there is still a TIME_WAIT entry in there for it.

    As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
    for it is 120s, there is a ~60s window that the application can end up
    opening a port that conntrack will end up blocking.

    This patch fixes this by simply allowing such state transition: if we
    see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
    that the rest of the code already handles this situation, more
    specificly in tcp_packet(), first switch clause.

    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Marcelo Leitner