14 Nov, 2014

5 commits

  • Pull networking fixes from David Miller:

    1) sunhme driver lacks DMA mapping error checks, based upon a report by
    Meelis Roos.

    2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee.

    3) DMA memory allocation sizes are wrong in systemport ethernet driver,
    fix from Florian Fainelli.

    4) Fix use after free in mac80211 defragmentation code, from Johannes
    Berg.

    5) Some networking uapi headers missing from Kbuild file, from Stephen
    Hemminger.

    6) TUN driver gets csum_start offset wrong when VLAN accel is enabled,
    and macvtap has a similar bug, from Herbert Xu.

    7) Adjust several tunneling drivers to set dev->iflink after registry,
    because registry sets that to -1 overwriting whatever we did. From
    Steffen Klassert.

    8) Geneve forgets to set inner tunneling type, causing GSO segmentation
    to fail on some NICs. From Jesse Gross.

    9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and
    Giuseppe CAVALLARO.

    10) Fix spurious timeouts with NewReno on low traffic connections, from
    Marcelo Leitner.

    11) Fix descriptor updates in enic driver, from Govindarajulu
    Varadarajan.

    12) PPP calls bpf_prog_create() with locks held, which isn't kosher.
    Fix from Takashi Iwai.

    13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel
    Borkmann.

    14) psock_fanout selftest accesses past the end of the mmap ring, fix
    from Shuah Khan.

    15) Fix PTP timestamping for VLAN packets, from Richard Cochran.

    16) netlink_unbind() calls in netlink pass wrong initial argument, from
    Hiroaki SHIMODA.

    17) vxlan socket reuse accidently reuses a socket when the address
    family is different, so we have to explicitly check this, from
    Marcelo Lietner.

    18) Fix missing include in nft_reject_bridge.c breaking the build on ppc
    and other architectures, from Guenter Roeck.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
    vxlan: Do not reuse sockets for a different address family
    smsc911x: power-up phydev before doing a software reset.
    lib: rhashtable - Remove weird non-ASCII characters from comments
    net/smsc911x: Fix delays in the PHY enable/disable routines
    net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode
    netlink: Properly unbind in error conditions.
    net: ptp: fix time stamp matching logic for VLAN packets.
    cxgb4 : dcb open-lldp interop fixes
    selftests/net: psock_fanout seg faults in sock_fanout_read_ring()
    net: bcmgenet: apply MII configuration in bcmgenet_open()
    net: bcmgenet: connect and disconnect from the PHY state machine
    net: qualcomm: Fix dependency
    ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx
    net: phy: Correctly handle MII ioctl which changes autonegotiation.
    ipv6: fix IPV6_PKTINFO with v4 mapped
    net: sctp: fix memory leak in auth key management
    net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
    net: ppp: Don't call bpf_prog_create() in ppp_lock
    net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set
    cxgb4 : Fix bug in DCB app deletion
    ...

    Linus Torvalds
     
  • No reason to use BUG_ON for osd request list assertions.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • kick_requests() can put linger requests on the notarget list. This
    means we need to clear the much-overloaded req->r_req_lru_item in
    __unregister_linger_request() as well, or we get an assertion failure
    in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

    AFAICT the assumption was that registered linger requests cannot be on
    any of req->r_req_lru_item lists, but that's clearly not the case.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • Requests have to be unlinked from both osd->o_requests (normal
    requests) and osd->o_linger_requests (linger requests) lists when
    clearing req->r_osd. Otherwise __unregister_linger_request() gets
    confused and we trip over a !list_empty(&osd->o_linger_requests)
    assert in __remove_osd().

    MON=1 OSD=1:

    # cat remove-osd.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    sleep 3
    rbd map dne/dne # obtain a new osdmap as a side effect
    rbd unmap $DEV & # will block
    sleep 3
    ceph osd in 0

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
    tickets will have their buffers vmalloc'ed, which leads to the
    following crash in crypto:

    [ 28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
    [ 28.686032] IP: [] scatterwalk_pagedone+0x22/0x80
    [ 28.686032] PGD 0
    [ 28.688088] Oops: 0000 [#1] PREEMPT SMP
    [ 28.688088] Modules linked in:
    [ 28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
    [ 28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    [ 28.688088] Workqueue: ceph-msgr con_work
    [ 28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
    [ 28.688088] RIP: 0010:[] [] scatterwalk_pagedone+0x22/0x80
    [ 28.688088] RSP: 0018:ffff8800d903f688 EFLAGS: 00010286
    [ 28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
    [ 28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
    [ 28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
    [ 28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
    [ 28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
    [ 28.688088] FS: 00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
    [ 28.688088] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
    [ 28.688088] Stack:
    [ 28.688088] ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
    [ 28.688088] ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
    [ 28.688088] ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
    [ 28.688088] Call Trace:
    [ 28.688088] [] scatterwalk_done+0x38/0x40
    [ 28.688088] [] scatterwalk_done+0x38/0x40
    [ 28.688088] [] blkcipher_walk_done+0x182/0x220
    [ 28.688088] [] crypto_cbc_encrypt+0x15f/0x180
    [ 28.688088] [] ? crypto_aes_set_key+0x30/0x30
    [ 28.688088] [] ceph_aes_encrypt2+0x29c/0x2e0
    [ 28.688088] [] ceph_encrypt2+0x93/0xb0
    [ 28.688088] [] ceph_x_encrypt+0x4a/0x60
    [ 28.688088] [] ? ceph_buffer_new+0x5d/0xf0
    [ 28.688088] [] ceph_x_build_authorizer.isra.6+0x297/0x360
    [ 28.688088] [] ? kmem_cache_alloc_trace+0x11b/0x1c0
    [ 28.688088] [] ? ceph_auth_create_authorizer+0x36/0x80
    [ 28.688088] [] ceph_x_create_authorizer+0x63/0xd0
    [ 28.688088] [] ceph_auth_create_authorizer+0x54/0x80
    [ 28.688088] [] get_authorizer+0x80/0xd0
    [ 28.688088] [] prepare_write_connect+0x18b/0x2b0
    [ 28.688088] [] try_read+0x1e59/0x1f10

    This is because we set up crypto scatterlists as if all buffers were
    kmalloc'ed. Fix it.

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     

13 Nov, 2014

1 commit

  • Even if netlink_kernel_cfg::unbind is implemented the unbind() method is
    not called, because cfg->unbind is omitted in __netlink_kernel_create().
    And fix wrong argument of test_bit() and off by one problem.

    At this point, no unbind() method is implemented, so there is no real
    issue.

    Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.")
    Signed-off-by: Hiroaki SHIMODA
    Cc: Richard Guy Briggs
    Acked-by: Richard Guy Briggs
    Signed-off-by: David S. Miller

    Hiroaki SHIMODA
     

12 Nov, 2014

3 commits

  • Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is
    a module.

    Signed-off-by: Eric Dumazet
    Fixes: c8e6ad0829a7 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg")
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • A very minimal and simple user space application allocating an SCTP
    socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
    the socket again will leak the memory containing the authentication
    key from user space:

    unreferenced object 0xffff8800837047c0 (size 16):
    comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
    hex dump (first 16 bytes):
    01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x4e/0xb0
    [] __kmalloc+0xe8/0x270
    [] sctp_auth_create_key+0x23/0x50 [sctp]
    [] sctp_auth_set_key+0xa1/0x140 [sctp]
    [] sctp_setsockopt+0xd03/0x1180 [sctp]
    [] sock_common_setsockopt+0x14/0x20
    [] SyS_setsockopt+0x71/0xd0
    [] system_call_fastpath+0x12/0x17
    [] 0xffffffffffffffff

    This is bad because of two things, we can bring down a machine from
    user space when auth_enable=1, but also we would leave security sensitive
    keying material in memory without clearing it after use. The issue is
    that sctp_auth_create_key() already sets the refcount to 1, but after
    allocation sctp_auth_set_key() does an additional refcount on it, and
    thus leaving it around when we free the socket.

    Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
    Signed-off-by: Daniel Borkmann
    Cc: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
    in the form of:

    ------------ INIT[PARAM: SET_PRIMARY_IP] ------------>

    While the INIT chunk parameter verification dissects through many things
    in order to detect malformed input, it misses to actually check parameters
    inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
    IP address' parameter in ASCONF, which has as a subparameter an address
    parameter.

    So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
    or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
    and thus sctp_get_af_specific() returns NULL, too, which we then happily
    dereference unconditionally through af->from_addr_param().

    The trace for the log:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
    IP: [] sctp_process_init+0x492/0x990 [sctp]
    PGD 0
    Oops: 0000 [#1] SMP
    [...]
    Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
    RIP: 0010:[] [] sctp_process_init+0x492/0x990 [sctp]
    [...]
    Call Trace:

    [] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
    [] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
    [] sctp_do_sm+0x71/0x1210 [sctp]
    [] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
    [] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
    [] sctp_inq_push+0x56/0x80 [sctp]
    [] sctp_rcv+0x982/0xa10 [sctp]
    [] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
    [] ? nf_iterate+0x69/0xb0
    [] ? ip_local_deliver_finish+0x0/0x2d0
    [] ? nf_hook_slow+0x76/0x120
    [] ? ip_local_deliver_finish+0x0/0x2d0
    [...]

    A minimal way to address this is to check for NULL as we do on all
    other such occasions where we know sctp_get_af_specific() could
    possibly return with NULL.

    Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
    Signed-off-by: Daniel Borkmann
    Cc: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

11 Nov, 2014

1 commit

  • When doing GRO processing for UDP tunnels, we never add
    SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol
    is added (such as SKB_GSO_TCPV4). The result is that if the packet is
    later resegmented we will do GSO but not treat it as a tunnel. This
    results in UDP fragmentation of the outer header instead of (i.e.) TCP
    segmentation of the inner header as was originally on the wire.

    Signed-off-by: Jesse Gross
    Signed-off-by: David S. Miller

    Jesse Gross
     

07 Nov, 2014

2 commits

  • John W. Linville says:

    ====================
    pull request: wireless 2014-11-06

    Please pull this batch of fixes intended for the 3.18 stream...

    For the mac80211 bits, Johannes says:

    "This contains another small set of fixes for 3.18, these are all
    over the place and most of the bugs are old, one even dates back
    to the original mac80211 we merged into the kernel."

    For the iwlwifi bits, Emmanuel says:

    "I fix here two issues that are related to the firmware
    loading flow. A user reported that he couldn't load the
    driver because the rfkill line was pulled up while we
    were running the calibrations. This was happening while
    booting the system: systemd was restoring the "disable
    wifi settings" and that raised an RFKILL interrupt during
    the calibration. Our driver didn't handle that properly
    and this is now fixed."

    Please let me know if there are problems!
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When the ports phys are connected to the switches internal MDIO bus,
    we need to connect the phy to the slave netdev, otherwise
    auto-negotiation etc, does not work.

    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     

06 Nov, 2014

3 commits

  • Ueki Kohei reported that when we are using NewReno with connections that
    have a very low traffic, we may timeout the connection too early if a
    second loss occurs after the first one was successfully acked but no
    data was transfered later. Below is his description of it:

    When SACK is disabled, and a socket suffers multiple separate TCP
    retransmissions, that socket's ETIMEDOUT value is calculated from the
    time of the *first* retransmission instead of the *latest*
    retransmission.

    This happens because the tcp_sock's retrans_stamp is set once then never
    cleared.

    Take the following connection:

    Linux remote-machine
    | |
    send#1---->(*1)|--------> data#1 --------->|
    | | |
    RTO : :
    | | |
    ---(*2)|----> data#1(retrans) ---->|
    | (*3)|(*4)|--------> data#2 --------->|
    | | |
    RTO : :
    | | |
    ---(*5)|----> data#2(retrans) ---->|
    | | |
    | | |
    RTO*2 : :
    | | |
    | | |
    ETIMEDOUT
    Cc: Yuchung Cheng
    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Neal Cardwell
    Tested-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Marcelo Leitner
     
  • The pernet ops aren't ever unregistered, which causes a memory
    leak and an OOPs if the module is ever reinserted.

    Fixes: 0b5e8b8eeae4 ("net: Add Geneve tunneling protocol driver")
    CC: Andy Zhou
    Signed-off-by: Jesse Gross
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Jesse Gross
     
  • Geneve does not currently set the inner protocol type when
    transmitting packets. This causes GSO segmentation to fail on NICs
    that do not support Geneve offloading.

    CC: Andy Zhou
    Signed-off-by: Jesse Gross
    Signed-off-by: David S. Miller

    Jesse Gross
     

05 Nov, 2014

1 commit


04 Nov, 2014

6 commits

  • Pull ceph fixes from Sage Weil:
    "There is a GFP flag fix from Mike Christie, an error code fix from
    Jan, and fixes for two unnecessary allocations (kmalloc and workqueue)
    from Ilya. All are well tested.

    Ilya has one other fix on the way but it didn't get tested in time"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: eliminate unnecessary allocation in process_one_ticket()
    rbd: Fix error recovery in rbd_obj_read_sync()
    libceph: use memalloc flags for net IO
    rbd: use a single workqueue for all devices

    Linus Torvalds
     
  • Otherwise it gets overwritten by register_netdev().

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • ipip6_tunnel_init() sets the dev->iflink via a call to
    ipip6_tunnel_bind_dev(). After that, register_netdevice()
    sets dev->iflink = -1. So we loose the iflink configuration
    for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the
    ndo_init function. Then ipip6_tunnel_init() is called after
    dev->iflink is set to -1 from register_netdevice().

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • vti6_dev_init() sets the dev->iflink via a call to
    vti6_link_config(). After that, register_netdevice()
    sets dev->iflink = -1. So we loose the iflink configuration
    for vti6 tunnels. Fix this by using vti6_dev_init() as the
    ndo_init function. Then vti6_dev_init() is called after
    dev->iflink is set to -1 from register_netdevice().

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • ip6_tnl_dev_init() sets the dev->iflink via a call to
    ip6_tnl_link_config(). After that, register_netdevice()
    sets dev->iflink = -1. So we loose the iflink configuration
    for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the
    ndo_init function. Then ip6_tnl_dev_init() is called after
    dev->iflink is set to -1 from register_netdevice().

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • Fix:
    net/bridge/netfilter/nft_reject_bridge.c:
    In function 'nft_reject_br_send_v6_unreach':
    net/bridge/netfilter/nft_reject_bridge.c:240:3:
    error: implicit declaration of function 'csum_ipv6_magic'
    csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr,
    ^
    make[3]: *** [net/bridge/netfilter/nft_reject_bridge.o] Error 1

    Seen with powerpc:allmodconfig.

    Fixes: 523b929d5446 ("netfilter: nft_reject_bridge: don't use IP stack to reject traffic")
    Cc: Pablo Neira Ayuso
    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     

03 Nov, 2014

2 commits

  • Upon receiving the last fragment, all but the first fragment
    are freed, but the multicast check for statistics at the end
    of the function refers to the current skb (the last fragment)
    causing a use-after-free bug.

    Since multicast frames cannot be fragmented and we check for
    this early in the function, just modify that check to also
    do the accounting to fix the issue.

    Cc: stable@vger.kernel.org
    Reported-by: Yosef Khyal
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • The sk_prot is irda's own set of protocol handlers, so irda should
    statically know what that function is anyway, without using an indirect
    pointer. And as it happens, we know *exactly* what that pointer is
    statically: it's NULL, because irda doesn't define a disconnect
    operation.

    So calling that function is doubly wrong, and will just cause an oops.

    Reported-by: Martin Lang
    Cc: Samuel Ortiz
    Cc: David Miller
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Nov, 2014

5 commits

  • Commit c27a3e4d667f ("libceph: do not hard code max auth ticket len")
    while fixing a buffer overlow tried to keep the same as much of the
    surrounding code as possible and introduced an unnecessary kmalloc() in
    the unencrypted ticket path. It is likely to fail on huge tickets, so
    get rid of it.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • If a driver supports reading EEPROM but no EEPROM is installed in the system,
    the driver's get_eeprom_len function returns 0. ethtool will subsequently
    try to read that zero-length EEPROM anyway. If the driver does not support
    EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver
    does support EEPROM access but no EEPROM is installed, the operation will
    return -EINVAL. Return -EOPNOTSUPP in both cases for consistency.

    Signed-off-by: Guenter Roeck
    Tested-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Kconfig already allows mpls to be built as module. Following patch
    fixes Makefile to do same.

    CC: Simon Horman
    Signed-off-by: Pravin B Shelar
    Acked-by: Simon Horman
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • mpls gso handler needs to pull skb after segmenting skb.

    CC: Simon Horman
    Signed-off-by: Pravin B Shelar
    Acked-by: Simon Horman
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Pablo Neira Ayuso says:

    ====================
    netfilter/ipvs fixes for net

    The following patchset contains fixes for netfilter/ipvs. This round of
    fixes is larger than usual at this stage, specifically because of the
    nf_tables bridge reject fixes that I would like to see in 3.18. The
    patches are:

    1) Fix a null-pointer dereference that may occur when logging
    errors. This problem was introduced by 4a4739d56b0 ("ipvs: Pull
    out crosses_local_route_boundary logic") in v3.17-rc5.

    2) Update hook mask in nft_reject_bridge so we can also filter out
    packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow
    to filter from prerouting and postrouting"), which needs this chunk
    to work.

    3) Two patches to refactor common code to forge the IPv4 and IPv6
    reject packets from the bridge. These are required by the nf_tables
    reject bridge fix.

    4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
    packets from the bridge. The idea is to forge the reject packets and
    inject them to the original port via br_deliver() which is now
    exported for that purpose.

    5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
    the original skbuff may cloned after prerouting when the bridge stack
    needs to flood it to several bridge ports, it is too late to reject
    the traffic.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

31 Oct, 2014

9 commits

  • Restrict the reject expression to the prerouting and input bridge
    hooks. If we allow this to be used from forward or any other later
    bridge hook, if the frame is flooded to several ports, we'll end up
    sending several reject packets, one per cloned packet.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • If the packet is received via the bridge stack, this cannot reject
    packets from the IP stack.

    This adds functions to build the reject packet and send it from the
    bridge stack. Comments and assumptions on this patch:

    1) Validate the IPv4 and IPv6 headers before further processing,
    given that the packet comes from the bridge stack, we cannot assume
    they are clean. Truncated packets are dropped, we follow similar
    approach in the existing iptables match/target extensions that need
    to inspect layer 4 headers that is not available. This also includes
    packets that are directed to multicast and broadcast ethernet
    addresses.

    2) br_deliver() is exported to inject the reject packet via
    bridge localout -> postrouting. So the approach is similar to what
    we already do in the iptables reject target. The reject packet is
    sent to the bridge port from which we have received the original
    packet.

    3) The reject packet is forged based on the original packet. The TTL
    is set based on sysctl_ip_default_ttl for IPv4 and per-net
    ipv6.devconf_all hoplimit for IPv6.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • That can be reused by the reject bridge expression to build the reject
    packet. The new functions are:

    * nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
    * nf_reject_ip6hdr_put(): to build the IPv6 header.
    * nf_reject_ip6_tcphdr_put(): to build the TCP header.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • That can be reused by the reject bridge expression to build the reject
    packet. The new functions are:

    * nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
    * nf_reject_iphdr_put(): to build the IPv4 header.
    * nf_reject_ip_tcphdr_put(): to build the TCP header.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Fixes: 36d2af5 ("netfilter: nf_tables: allow to filter from prerouting and postrouting")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • UFO is now disabled on all drivers that work with virtio net headers,
    but userland may try to send UFO/IPv6 packets anyway. Instead of
    sending with ID=0, we should select identifiers on their behalf (as we
    used to).

    Signed-off-by: Ben Hutchings
    Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Some drivers are unable to perform TX completions in a bound time.
    They instead call skb_orphan()

    Problem is skb_fclone_busy() has to detect this case, otherwise
    we block TCP retransmits and can freeze unlucky tcp sessions on
    mostly idle hosts.

    Signed-off-by: Eric Dumazet
    Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues")
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Currently, skb_inner_network_header is used but this does not account
    for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
    handles TEB and also should work with IP encapsulation in which case
    inner mac and inner network headers are the same.

    Tested: Ran TCP_STREAM over GRE, worked as expected.

    Signed-off-by: Tom Herbert
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • If we cache them, the kernel will reuse them, independently of
    whether forwarding is enabled or not. Which means that if forwarding is
    disabled on the input interface where the first routing request comes
    from, then that unreachable result will be cached and reused for
    other interfaces, even if forwarding is enabled on them. The opposite
    is also true.

    This can be verified with two interfaces A and B and an output interface
    C, where B has forwarding enabled, but not A and trying
    ip route get $dst iif A from $src && ip route get $dst iif B from $src

    Signed-off-by: Nicolas Cavallari
    Reviewed-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Nicolas Cavallari
     

30 Oct, 2014

2 commits

  • When an interface is deleted, an ongoing hardware scan is canceled and
    the driver must abort the scan, at the very least reporting completion
    while the interface is removed.

    However, if it scheduled the work that might only run after everything
    is said and done, which leads to cfg80211 warning that the scan isn't
    reported as finished yet; this is no fault of the driver, it already
    did, but mac80211 hasn't processed it.

    To fix this situation, flush the delayed work when the interface being
    removed is the one that was executing the scan.

    Cc: stable@vger.kernel.org
    Reported-by: Sujith Manoharan
    Tested-by: Sujith Manoharan
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • This patch has ceph's lib code use the memalloc flags.

    If the VM layer needs to write data out to free up memory to handle new
    allocation requests, the block layer must be able to make forward progress.
    To handle that requirement we use structs like mempools to reserve memory for
    objects like bios and requests.

    The problem is when we send/receive block layer requests over the network
    layer, net skb allocations can fail and the system can lock up.
    To solve this, the memalloc related flags were added. NBD, iSCSI
    and NFS uses these flags to tell the network/vm layer that it should
    use memory reserves to fullfill allcation requests for structs like
    skbs.

    I am running ceph in a bunch of VMs in my laptop, so this patch was
    not tested very harshly.

    Signed-off-by: Mike Christie
    Reviewed-by: Ilya Dryomov

    Mike Christie