25 Sep, 2015

1 commit

  • When using ip lwtunnels, the additional data for xmit (basically, the actual
    tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
    metadata dst. When replying to ARP requests, we need to send the reply to
    the same tunnel the request came from. This means we need to construct
    proper metadata dst for ARP replies.

    We could perform another route lookup to get a dst entry with the correct
    lwtstate. However, this won't always ensure that the outgoing tunnel is the
    same as the incoming one, and it won't work anyway for IPv4 duplicate
    address detection.

    The only thing to do is to "reverse" the ip_tunnel_info.

    Signed-off-by: Jiri Benc
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     

22 Sep, 2015

1 commit

  • When creating a timewait socket, we need to arm the timer before
    allowing other cpus to find it. The signal allowing cpus to find
    the socket is setting tw_refcnt to non zero value.

    As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
    call inet_twsk_schedule() first.

    This also means we need to remove tw_refcnt changes from
    inet_twsk_schedule() and let the caller handle it.

    Note that because we use mod_timer_pinned(), we have the guarantee
    the timer wont expire before we set tw_refcnt as we run in BH context.

    To make things more readable I introduced inet_twsk_reschedule() helper.

    When rearming the timer, we can use mod_timer_pending() to make sure
    we do not rearm a canceled timer.

    Note: This bug can possibly trigger if packets of a flow can hit
    multiple cpus. This does not normally happen, unless flow steering
    is broken somehow. This explains this bug was spotted ~5 months after
    its introduction.

    A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
    but will be provided in a separate patch for proper tracking.

    Fixes: 789f558cfb36 ("tcp/dccp: get rid of central timewait timer")
    Signed-off-by: Eric Dumazet
    Reported-by: Ying Cai
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Sep, 2015

2 commits

  • Like the previous patch, which fixes ipv4 tunnels, here is the ipv6 part.

    Before the patch, the external ipv6 header + gre header were included on
    tx.

    After the patch:
    $ ping -c1 192.168.6.121 ; ip -s l ls dev ip6gre1
    PING 192.168.6.121 (192.168.6.121) 56(84) bytes of data.
    64 bytes from 192.168.6.121: icmp_req=1 ttl=64 time=1.92 ms

    --- 192.168.6.121 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 1.923/1.923/1.923/0.000 ms
    7: ip6gre1@NONE: mtu 1440 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/gre6 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:23 peer 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:21
    RX: bytes packets errors dropped overrun mcast
    84 1 0 0 0 0
    TX: bytes packets errors dropped carrier collsns
    84 1 0 0 0 0
    $ ping -c1 192.168.1.121 ; ip -s l ls dev ip6tnl1
    PING 192.168.1.121 (192.168.1.121) 56(84) bytes of data.
    64 bytes from 192.168.1.121: icmp_req=1 ttl=64 time=2.28 ms

    --- 192.168.1.121 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 2.288/2.288/2.288/0.000 ms
    8: ip6tnl1@NONE: mtu 1452 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/tunnel6 2001:660:3008:c1c3::123 peer 2001:660:3008:c1c3::121
    RX: bytes packets errors dropped overrun mcast
    84 1 0 0 0 0
    TX: bytes packets errors dropped carrier collsns
    84 1 0 0 0 0

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • Man page of ip-route(8) says following about route types:

    unreachable - these destinations are unreachable. Packets are dis‐
    carded and the ICMP message host unreachable is generated. The local
    senders get an EHOSTUNREACH error.

    blackhole - these destinations are unreachable. Packets are dis‐
    carded silently. The local senders get an EINVAL error.

    prohibit - these destinations are unreachable. Packets are discarded
    and the ICMP message communication administratively prohibited is
    generated. The local senders get an EACCES error.

    In the inet6 address family, this was correct, except the local senders
    got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
    In the inet address family, all three route types generated ICMP message
    net unreachable, and the local senders got ENETUNREACH error.

    In both address families all three route types now behave consistently
    with documentation.

    Signed-off-by: Nikola Forró
    Signed-off-by: David S. Miller

    Nikola Forró
     

18 Sep, 2015

2 commits

  • Steffen reported that the recent change to add oif to dst lookups breaks
    the VTI use case. The problem is that with the oif set in the flow struct
    the comparison to the nh_oif is triggered. Fix by splitting the
    FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache
    bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare
    nh oif (FLOWI_FLAG_SKIP_NH_OIF).

    Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")

    Signed-off-by: David Ahern
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    David Ahern
     
  • This patch adds NLM_F_REPLACE flag to ipv6 route replace notifications.
    This makes nlm_flags in ipv6 replace notifications consistent
    with ipv4.

    Signed-off-by: Roopa Prabhu
    Acked-by: Nicolas Dichtel
    Reviewed-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

16 Sep, 2015

3 commits

  • This patch uses a seqlock to ensure consistency between idst->dst and
    idst->cookie. It also makes dst freeing from fib tree to undergo a
    rcu grace period.

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • Problems in the current dst_entry cache in the ip6_tunnel:

    1. ip6_tnl_dst_set is racy. There is no lock to protect it:
    - One major problem is that the dst refcnt gets messed up. F.e.
    the same dst_cache can be released multiple times and then
    triggering the infamous dst refcnt < 0 warning message.
    - Another issue is the inconsistency between dst_cache and
    dst_cookie.

    It can be reproduced by adding and removing the ip6gre tunnel
    while running a super_netperf TCP_CRR test.

    2. ip6_tnl_dst_get does not take the dst refcnt before returning
    the dst.

    This patch:
    1. Create a percpu dst_entry cache in ip6_tnl
    2. Use a spinlock to protect the dst_cache operations
    3. ip6_tnl_dst_get always takes the dst refcnt before returning

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • It is a prep work to fix the dst_entry refcnt bugs in
    ip6_tunnel.

    This patch rename:
    1. ip6_tnl_dst_check() to ip6_tnl_dst_get() to better
    reflect that it will take a dst refcnt in the next patch.
    2. ip6_tnl_dst_store() to ip6_tnl_dst_set() to have a more
    conventional name matching with ip6_tnl_dst_get().

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

11 Sep, 2015

1 commit

  • Pull networking fixes from David Miller:

    1) Fix out-of-bounds array access in netfilter ipset, from Jozsef
    Kadlecsik.

    2) Use correct free operation on netfilter conntrack templates, from
    Daniel Borkmann.

    3) Fix route leak in SCTP, from Marcelo Ricardo Leitner.

    4) Fix sizeof(pointer) in mac80211, from Thierry Reding.

    5) Fix cache pointer comparison in ip6mr leading to missed unlock of
    mrt_lock. From Richard Laing.

    6) rds_conn_lookup() needs to consider network namespace in key
    comparison, from Sowmini Varadhan.

    7) Fix deadlock in TIPC code wrt broadcast link wakeups, from Kolmakov
    Dmitriy.

    8) Fix fd leaks in bpf syscall, from Daniel Borkmann.

    9) Fix error recovery when installing ipv6 multipath routes, we would
    delete the old route before we would know if we could fully commit
    to the new set of nexthops. Fix from Roopa Prabhu.

    10) Fix run-time suspend problems in r8152, from Hayes Wang.

    11) In fec, don't program the MAC address into the chip when the clocks
    are gated off. From Fugang Duan.

    12) Fix poll behavior for netlink sockets when using rx ring mmap, from
    Daniel Borkmann.

    13) Don't allocate memory with GFP_KERNEL from get_stats64 in r8169
    driver, from Corinna Vinschen.

    14) In TCP Cubic congestion control, handle idle periods better where we
    are application limited, in order to keep cwnd from growing out of
    control. From Eric Dumzet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
    tcp_cubic: better follow cubic curve after idle period
    tcp: generate CA_EVENT_TX_START on data frames
    xen-netfront: respect user provided max_queues
    xen-netback: respect user provided max_queues
    r8169: Fix sleeping function called during get_stats64, v2
    ether: add IEEE 1722 ethertype - TSN
    netlink, mmap: fix edge-case leakages in nf queue zero-copy
    netlink, mmap: don't walk rx ring on poll if receive queue non-empty
    cxgb4: changes for new firmware 1.14.4.0
    net: fec: add netif status check before set mac address
    r8152: fix the runtime suspend issues
    r8152: split DRIVER_VERSION
    ipv6: fix ifnullfree.cocci warnings
    add microchip LAN88xx phy driver
    stmmac: fix check for phydev being open
    net: qlcnic: delete redundant memsets
    net: mv643xx_eth: use kzalloc
    net: jme: use kzalloc() instead of kmalloc+memset
    net: cavium: liquidio: use kzalloc in setup_glist()
    net: ipv6: use common fib_default_rule_pref
    ...

    Linus Torvalds
     

10 Sep, 2015

1 commit

  • This switches IPv6 policy routing to use the shared
    fib_default_rule_pref() function of IPv4 and DECnet. It is also used in
    multicast routing for IPv4 as well as IPv6.

    The motivation for this patch is a complaint about iproute2 behaving
    inconsistent between IPv4 and IPv6 when adding policy rules: Formerly,
    IPv6 rules were assigned a fixed priority of 0x3FFF whereas for IPv4 the
    assigned priority value was decreased with each rule added.

    Since then all users of the default_pref field have been converted to
    assign the generic function fib_default_rule_pref(), fib_nl_newrule()
    may just use it directly instead. Therefore get rid of the function
    pointer altogether and make fib_default_rule_pref() static, as it's not
    used outside fib_rules.c anymore.

    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     

09 Sep, 2015

3 commits

  • Pull inifiniband/rdma updates from Doug Ledford:
    "This is a fairly sizeable set of changes. I've put them through a
    decent amount of testing prior to sending the pull request due to
    that.

    There are still a few fixups that I know are coming, but I wanted to
    go ahead and get the big, sizable chunk into your hands sooner rather
    than waiting for those last few fixups.

    Of note is the fact that this creates what is intended to be a
    temporary area in the drivers/staging tree specifically for some
    cleanups and additions that are coming for the RDMA stack. We
    deprecated two drivers (ipath and amso1100) and are waiting to hear
    back if we can deprecate another one (ehca). We also put Intel's new
    hfi1 driver into this area because it needs to be refactored and a
    transfer library created out of the factored out code, and then it and
    the qib driver and the soft-roce driver should all be modified to use
    that library.

    I expect drivers/staging/rdma to be around for three or four kernel
    releases and then to go away as all of the work is completed and final
    deletions of deprecated drivers are done.

    Summary of changes for 4.3:

    - Create drivers/staging/rdma
    - Move amso1100 driver to staging/rdma and schedule for deletion
    - Move ipath driver to staging/rdma and schedule for deletion
    - Add hfi1 driver to staging/rdma and set TODO for move to regular
    tree
    - Initial support for namespaces to be used on RDMA devices
    - Add RoCE GID table handling to the RDMA core caching code
    - Infrastructure to support handling of devices with differing read
    and write scatter gather capabilities
    - Various iSER updates
    - Kill off unsafe usage of global mr registrations
    - Update SRP driver
    - Misc mlx4 driver updates
    - Support for the mr_alloc verb
    - Support for a netlink interface between kernel and user space cache
    daemon to speed path record queries and route resolution
    - Ininitial support for safe hot removal of verbs devices"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits)
    IB/ipoib: Suppress warning for send only join failures
    IB/ipoib: Clean up send-only multicast joins
    IB/srp: Fix possible protection fault
    IB/core: Move SM class defines from ib_mad.h to ib_smi.h
    IB/core: Remove unnecessary defines from ib_mad.h
    IB/hfi1: Add PSM2 user space header to header_install
    IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY
    mlx5: Fix incorrect wc pkey_index assignment for GSI messages
    IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
    IB/uverbs: reject invalid or unknown opcodes
    IB/cxgb4: Fix if statement in pick_local_ip6adddrs
    IB/sa: Fix rdma netlink message flags
    IB/ucma: HW Device hot-removal support
    IB/mlx4_ib: Disassociate support
    IB/uverbs: Enable device removal when there are active user space applications
    IB/uverbs: Explicitly pass ib_dev to uverbs commands
    IB/uverbs: Fix race between ib_uverbs_open and remove_one
    IB/uverbs: Fix reference counting usage of event files
    IB/core: Make ib_dealloc_pd return void
    IB/srp: Create an insecure all physical rkey only if needed
    ...

    Linus Torvalds
     
  • The only user is sock_update_memcg which is living in memcontrol.c so it
    doesn't make much sense to pollute sock.h by this inline helper. Move it
    to memcontrol.c and open code it into its only caller.

    Signed-off-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • mem_cgroup structure is defined in mm/memcontrol.c currently which means
    that the code outside of this file has to use external API even for
    trivial access stuff.

    This patch exports mm_struct with its dependencies and makes some of the
    exported functions inlines. This even helps to reduce the code size a bit
    (make defconfig + CONFIG_MEMCG=y)

    text data bss dec hex filename
    12355346 1823792 1089536 15268674 e8fb42 vmlinux.before
    12354970 1823792 1089536 15268298 e8f9ca vmlinux.after

    This is not much (370B) but better than nothing.

    We also save a function call in some hot paths like callers of
    mem_cgroup_count_vm_event which is used for accounting.

    The patch doesn't introduce any functional changes.

    [vdavykov@parallels.com: inline memcg_kmem_is_active]
    [vdavykov@parallels.com: do not expose type outside of CONFIG_MEMCG]
    [akpm@linux-foundation.org: memcontrol.h needs eventfd.h for eventfd_ctx]
    [akpm@linux-foundation.org: export mem_cgroup_from_task() to modules]
    Signed-off-by: Michal Hocko
    Reviewed-by: Vladimir Davydov
    Suggested-by: Johannes Weiner
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

07 Sep, 2015

1 commit

  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    For the first round of fixes, we have this:
    * fix for the sizeof() pointer type issue
    * a fix for regulatory getting into a restore loop
    * a fix for rfkill global 'all' state, it needs to be stored
    everywhere to apply correctly to new rfkill instances
    * properly refuse CQM RSSI when it cannot actually be used
    * protect HT TDLS traffic properly in non-HT networks
    * don't incorrectly advertise 80 MHz support when not allowed
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

06 Sep, 2015

1 commit

  • Conflicts:
    include/net/netfilter/nf_conntrack.h

    The conflict was an overlap between changing the type of the zone
    argument to nf_ct_tmpl_alloc() whilst exporting nf_ct_tmpl_free.

    Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net, they are:

    1) Oneliner to restore maps in nf_tables since we support addressing registers
    at 32 bits level.

    2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n,
    oneliner from Bernhard Thaler.

    3) Out of bound access in ipset hash:net* set types, reported by Dave Jones'
    KASan utility, patch from Jozsef Kadlecsik.

    4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of
    unnamed unions, patch from Elad Raz.

    5) Add a workaround to address inconsistent endianess in the res_id field of
    nfnetlink batch messages, reported by Florian Westphal.

    6) Fix error paths of CT/synproxy since the conntrack template was moved to use
    kmalloc, patch from Daniel Borkmann.

    All of them look good to me to reach 4.2, I can route this to -stable myself
    too, just let me know what you prefer.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Sep, 2015

1 commit

  • HT TDLS traffic should be protected in a non-HT BSS to avoid
    collisions. Therefore, when TDLS peers join/leave, check if
    protection is (now) needed and set the ht_operation_mode of
    the virtual interface according to the HT capabilities of the
    TDLS peer(s).

    This works because a non-HT BSS connection never sets (or
    otherwise uses) the ht_operation_mode; it just means that
    drivers must be aware that this field applies to all HT
    traffic for this virtual interface, not just the traffic
    within the BSS. Document that.

    Signed-off-by: Avri Altman
    Signed-off-by: Johannes Berg

    Avri Altman
     

03 Sep, 2015

1 commit

  • Fengguang reported, that some randconfig generated the following linker
    issue with nf_ct_zone_dflt object involved:

    [...]
    CC init/version.o
    LD init/built-in.o
    net/built-in.o: In function `ipv4_conntrack_defrag':
    nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt'
    net/built-in.o: In function `ipv6_defrag':
    nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt'
    make: *** [vmlinux] Error 1

    Given that configurations exist where we have a built-in part, which is
    accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user()
    and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a
    module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in
    area when netfilter is configured in general.

    Therefore, split the more generic parts into a common header under
    include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in
    section that already holds parts related to CONFIG_NF_CONNTRACK in the
    netfilter core. This fixes the issue on my side.

    Fixes: 308ac9143ee2 ("netfilter: nf_conntrack: push zone object into functions")
    Reported-by: Fengguang Wu
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

02 Sep, 2015

10 commits

  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Just have a flags member instead.

    In file included from include/linux/linkage.h:4:0,
    from include/linux/kernel.h:6,
    from net/core/flow_dissector.c:1:
    In function 'flow_keys_hash_start',
    inlined from 'flow_hash_from_keys' at net/core/flow_dissector.c:553:34:
    >> include/linux/compiler.h:447:38: error: call to '__compiletime_assert_459' declared with attribute error: BUILD_BUG_ON failed: FLOW_KEYS_HASH_OFFSET % sizeof(u32)

    Reported-by: kbuild test robot
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Add an input flag to flow dissector on rather dissection should stop
    when encapsulation is detected (IP/IP or GRE). Also, add a key_control
    flag that indicates encapsulation was encountered during the
    dissection.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Add an input flag to flow dissector on rather dissection should be
    stopped when a flow label is encountered. Presumably, the flow label
    is derived from a sufficient hash of an inner transport packet so
    further dissection is not needed (that is ports are not included in
    the flow hash). Using the flow label instead of ports has the additional
    benefit that packet fragments should hash to same value as non-fragments
    for a flow (assuming that the same flow label is used).

    We set this flag by default in for skb_get_hash.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Add an input flag to flow dissector on rather dissection should be
    stopped when an L3 packet is encountered. This would be useful if a
    caller just wanted to get IP addresses of the outermost header (e.g.
    to do an L3 hash).

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Add an input flag to flow dissector on rather dissection should be
    attempted on a first fragment. Also add key_control flags to indicate
    that a packet is a fragment or first fragment.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Create __get_hash_from_flowi6 and __get_hash_from_flowi4 to get the
    flow keys and hash based on flowi structures. These are called by
    __skb_get_hash_flowi6 and __skb_get_hash_flowi4. Also, created
    get_hash_from_flowi6 and get_hash_from_flowi4 which can be called
    when just the hash value for a flowi is needed.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Move __skb_set_sw_hash to skbuff.h and add __skb_set_hash which is
    a common method (between __skb_set_sw_hash and skb_set_hash) to set
    the hash in an skbuff.

    Also, move skb_clear_hash to be closer to __skb_set_hash.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Move the flow dissector functions that are specific to skbuffs into
    skbuff.h out of flow_dissector.h. This makes flow_dissector.h have
    no dependencies on skbuff.h.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • A number of VRF patches used 'int' for table id. It should be u32 to be
    consistent with the rest of the stack.

    Fixes:
    4e3c89920cd3a ("net: Introduce VRF related flags and helpers")
    15be405eb2ea9 ("net: Add inet_addr lookup by table")
    30bbaa1950055 ("net: Fix up inet_addr_type checks")
    021dd3b8a142d ("net: Add routes to the table associated with the device")
    dc028da54ed35 ("inet: Move VRF table lookup to inlined function")
    f6d3c19274c74 ("net: FIB tracepoints")

    Signed-off-by: David Ahern
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    David Ahern
     

01 Sep, 2015

5 commits

  • Commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack
    templates") migrated templates to the new allocator api, but forgot to
    update error paths for them in CT and synproxy to use nf_ct_tmpl_free()
    instead of nf_conntrack_free().

    Due to that, memory is being freed into the wrong kmemcache, but also
    we drop the per net reference count of ct objects causing an imbalance.

    In Brad's case, this leads to a wrap-around of net->ct.count and thus
    lets __nf_conntrack_alloc() refuse to create a new ct object:

    [ 10.340913] xt_addrtype: ipv6 does not support BROADCAST matching
    [ 10.810168] nf_conntrack: table full, dropping packet
    [ 11.917416] r8169 0000:07:00.0 eth0: link up
    [ 11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
    [ 12.815902] nf_conntrack: table full, dropping packet
    [ 15.688561] nf_conntrack: table full, dropping packet
    [ 15.689365] nf_conntrack: table full, dropping packet
    [ 15.690169] nf_conntrack: table full, dropping packet
    [ 15.690967] nf_conntrack: table full, dropping packet
    [...]

    With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs.
    nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus,
    to fix the problem, export and use nf_ct_tmpl_free() instead.

    Fixes: 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates")
    Reported-by: Brad Jackson
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Pablo Neira Ayuso

    Daniel Borkmann
     
  • opts_size is only written and never read. Following patch
    removes this unused variable.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • As David pointed out, spinlock are no longer needed
    to protect the per cpu queues used in gro cells infrastructure.

    Also use new napi_complete_done() API so that gro_flush_timeout
    tweaks have an effect.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Currently, the following case doesn't use DCTCP, even if it should:
    A responder has f.e. Cubic as system wide default, but for a specific
    route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
    initiating host then uses DCTCP as congestion control, but since the
    initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
    and we have to fall back to Reno after 3WHS completes.

    We were thinking on how to solve this in a minimal, non-intrusive
    way without bloating tcp_ecn_create_request() needlessly: lets cache
    the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
    is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
    contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
    to only do a single metric feature lookup inside tcp_ecn_create_request().

    Joint work with Florian Westphal.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Currently tun-info options pointer is used in few cases to
    pass options around. But tunnel options can be accessed using
    ip_tunnel_info_opts() API without using the pointer. Following
    patch removes the redundant pointer and consistently make use
    of API.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Reviewed-by: Jesse Gross
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

31 Aug, 2015

3 commits


30 Aug, 2015

3 commits

  • By default (subject to the sysctl settings), IPv6 sockets listen also for
    IPv4 traffic. Vxlan is not prepared for that and expects IPv6 header in
    packets received through an IPv6 socket.

    In addition, it's currently not possible to have both IPv4 and IPv6 vxlan
    tunnel on the same port (unless bindv6only sysctl is enabled), as it's not
    possible to create and bind both IPv4 and IPv6 vxlan interfaces and there's
    no way to specify both IPv4 and IPv6 remote/group IP addresses.

    Set IPV6_V6ONLY on vxlan sockets to fix both of these issues. This is not
    done globally in udp_tunnel, as l2tp and tipc seems to work okay when
    receiving IPv4 packets on IPv6 socket and people may rely on this behavior.
    The other tunnels (geneve and fou) do not support IPv6.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • There's currently nothing preventing directing packets with IPv6
    encapsulation data to IPv4 tunnels (and vice versa). If this happens,
    IPv6 addresses are incorrectly interpreted as IPv4 ones.

    Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this
    in ip_tunnel_info. Reject packets at appropriate places if they are supposed
    to be encapsulated into an incompatible protocol.

    Signed-off-by: Jiri Benc
    Acked-by: Alexei Starovoitov
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • The mode field holds a single bit of information only (whether the
    ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags.
    This allows more mode flags to be added.

    Signed-off-by: Jiri Benc
    Acked-by: Alexei Starovoitov
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     

29 Aug, 2015

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter/IPVS updates for your net-next tree.
    In sum, patches to address fallout from the previous round plus updates from
    the IPVS folks via Simon Horman, they are:

    1) Add a new scheduler to IPVS: The weighted overflow scheduling algorithm
    directs network connections to the server with the highest weight that is
    currently available and overflows to the next when active connections exceed
    the node's weight. From Raducu Deaconu.

    2) Fix locking ordering in IPVS, always take rtnl_lock in first place. Patch
    from Julian Anastasov.

    3) Allow to indicate the MTU to the IPVS in-kernel state sync daemon. From
    Julian Anastasov.

    4) Enhance multicast configuration for the IPVS state sync daemon. Also from
    Julian.

    5) Resolve sparse warnings in the nf_dup modules.

    6) Fix a linking problem when CONFIG_NF_DUP_IPV6 is not set.

    7) Add ICMP codes 5 and 6 to IPv6 REJECT target, they are more informative
    subsets of code 1. From Andreas Herz.

    8) Revert the jumpstack size calculation from mark_source_chains due to chain
    depth miscalculations, from Florian Westphal.

    9) Calm down more sparse warning around the Netfilter tree, again from Florian
    Westphal.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller