28 Dec, 2016

2 commits

  • Pull networking fixes from David Miller:

    1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.

    The most important is to not assume the packet is RX just because
    the destination address matches that of the device. Such an
    assumption causes problems when an interface is put into loopback
    mode.

    2) If we retry when creating a new tc entry (because we dropped the
    RTNL mutex in order to load a module, for example) we end up with
    -EAGAIN and then loop trying to replay the request. But we didn't
    reset some state when looping back to the top like this, and if
    another thread meanwhile inserted the same tc entry we were trying
    to, we re-link it creating an enless loop in the tc chain. Fix from
    Daniel Borkmann.

    3) There are two different WRITE bits in the MDIO address register for
    the stmmac chip, depending upon the chip variant. Due to a bug we
    could set them both, fix from Hock Leong Kweh.

    4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: stmmac: fix incorrect bit set in gmac4 mdio addr register
    r8169: add support for RTL8168 series add-on card.
    net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
    openvswitch: upcall: Fix vlan handling.
    ipv4: Namespaceify tcp_tw_reuse knob
    net: korina: Fix NAPI versus resources freeing
    net, sched: fix soft lockup in tc_classify
    net/mlx4_en: Fix user prio field in XDP forward
    tipc: don't send FIN message from connectionless socket
    ipvlan: fix multicast processing
    ipvlan: fix various issues in ipvlan_process_multicast()

    Linus Torvalds
     
  • Different namespaces might have different requirements to reuse
    TIME-WAIT sockets for new connections. This might be required in
    cases where different namespace applications are in place which
    require TIME_WAIT socket connections to be reduced independently
    of the host.

    Signed-off-by: Haishuang Yan
    Signed-off-by: David S. Miller

    Haishuang Yan
     

26 Dec, 2016

1 commit

  • ktime is a union because the initial implementation stored the time in
    scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
    variant for 32bit machines. The Y2038 cleanup removed the timespec variant
    and switched everything to scalar nanoseconds. The union remained, but
    become completely pointless.

    Get rid of the union and just keep ktime_t as simple typedef of type s64.

    The conversion was done with coccinelle and some manual mopping up.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     

25 Dec, 2016

1 commit


18 Dec, 2016

2 commits

  • Pull networking fixes and cleanups from David Miller:

    1) Revert bogus nla_ok() change, from Alexey Dobriyan.

    2) Various bpf validator fixes from Daniel Borkmann.

    3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04
    drivers, from Dongpo Li.

    4) Several ethtool ksettings conversions from Philippe Reynes.

    5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert.

    6) XDP support for virtio_net, from John Fastabend.

    7) Fix NAT handling within a vrf, from David Ahern.

    8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits)
    net: mv643xx_eth: fix build failure
    isdn: Constify some function parameters
    mlxsw: spectrum: Mark split ports as such
    cgroup: Fix CGROUP_BPF config
    qed: fix old-style function definition
    net: ipv6: check route protocol when deleting routes
    r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected
    irda: w83977af_ir: cleanup an indent issue
    net: sfc: use new api ethtool_{get|set}_link_ksettings
    net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
    net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings
    net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings
    net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings
    bpf: fix mark_reg_unknown_value for spilled regs on map value marking
    bpf: fix overflow in prog accounting
    bpf: dynamically allocate digest scratch buffer
    gtp: Fix initialization of Flags octet in GTPv1 header
    gtp: gtp_check_src_ms_ipv4() always return success
    net/x25: use designated initializers
    isdn: use designated initializers
    ...

    Linus Torvalds
     
  • A user may call listen with binding an explicit port with the intent
    that the kernel will assign an available port to the socket. In this
    case inet_csk_get_port does a port scan. For such sockets, the user may
    also set soreuseport with the intent a creating more sockets for the
    port that is selected. The problem is that the initial socket being
    opened could inadvertently choose an existing and unreleated port
    number that was already created with soreuseport.

    This patch adds a boolean parameter to inet_bind_conflict that indicates
    rather soreuseport is allowed for the check (in addition to
    sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port
    the argument is set to true if an explicit port is being looked up (snum
    argument is nonzero), and is false if port scan is done.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

17 Dec, 2016

1 commit

  • Pull vfs updates from Al Viro:

    - more ->d_init() stuff (work.dcache)

    - pathname resolution cleanups (work.namei)

    - a few missing iov_iter primitives - copy_from_iter_full() and
    friends. Either copy the full requested amount, advance the iterator
    and return true, or fail, return false and do _not_ advance the
    iterator. Quite a few open-coded callers converted (and became more
    readable and harder to fuck up that way) (work.iov_iter)

    - several assorted patches, the big one being logfs removal

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    logfs: remove from tree
    vfs: fix put_compat_statfs64() does not handle errors
    namei: fold should_follow_link() with the step into not-followed link
    namei: pass both WALK_GET and WALK_MORE to should_follow_link()
    namei: invert WALK_PUT logics
    namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
    namei: saner calling conventions for mountpoint_last()
    namei.c: get rid of user_path_parent()
    switch getfrag callbacks to ..._full() primitives
    make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
    [iov_iter] new primitives - copy_from_iter_full() and friends
    don't open-code file_inode()
    ceph: switch to use of ->d_init()
    ceph: unify dentry_operations instances
    lustre: switch to use of ->d_init()

    Linus Torvalds
     

14 Dec, 2016

1 commit


13 Dec, 2016

1 commit

  • Pull smp hotplug updates from Thomas Gleixner:
    "This is the final round of converting the notifier mess to the state
    machine. The removal of the notifiers and the related infrastructure
    will happen around rc1, as there are conversions outstanding in other
    trees.

    The whole exercise removed about 2000 lines of code in total and in
    course of the conversion several dozen bugs got fixed. The new
    mechanism allows to test almost every hotplug step standalone, so
    usage sites can exercise all transitions extensively.

    There is more room for improvement, like integrating all the
    pointlessly different architecture mechanisms of synchronizing,
    setting cpus online etc into the core code"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
    tracing/rb: Init the CPU mask on allocation
    soc/fsl/qbman: Convert to hotplug state machine
    soc/fsl/qbman: Convert to hotplug state machine
    zram: Convert to hotplug state machine
    KVM/PPC/Book3S HV: Convert to hotplug state machine
    arm64/cpuinfo: Convert to hotplug state machine
    arm64/cpuinfo: Make hotplug notifier symmetric
    mm/compaction: Convert to hotplug state machine
    iommu/vt-d: Convert to hotplug state machine
    mm/zswap: Convert pool to hotplug state machine
    mm/zswap: Convert dst-mem to hotplug state machine
    mm/zsmalloc: Convert to hotplug state machine
    mm/vmstat: Convert to hotplug state machine
    mm/vmstat: Avoid on each online CPU loops
    mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
    tracing/rb: Convert to hotplug state machine
    oprofile/nmi timer: Convert to hotplug state machine
    net/iucv: Use explicit clean up labels in iucv_init()
    x86/pci/amd-bus: Convert to hotplug state machine
    x86/oprofile/nmi: Convert to hotplug state machine
    ...

    Linus Torvalds
     

10 Dec, 2016

1 commit


09 Dec, 2016

4 commits

  • When mac80211 abandons an association attempt, it may free
    all the data structures, but inform cfg80211 and userspace
    about it only by sending the deauth frame it received, in
    which case cfg80211 has no link to the BSS struct that was
    used and will not cfg80211_unhold_bss() it.

    Fix this by providing a way to inform cfg80211 of this with
    the BSS entry passed, so that it can clean up properly, and
    use this ability in the appropriate places in mac80211.

    This isn't ideal: some code is more or less duplicated and
    tracing is missing. However, it's a fairly small change and
    it's thus easier to backport - cleanups can come later.

    Cc: stable@vger.kernel.org
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • sk_drops can be an often written field, do not read it unless
    application showed interest.

    Note that sk_drops can be read via inet_diag, so applications
    can avoid getting this info from every received packet.

    In the future, 'reading' sk_drops might require folding per node or per
    cpu fields, and thus become even more expensive than today.

    Signed-off-by: Eric Dumazet
    Cc: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • RFS is not commonly used, so add a jump label to avoid some conditionals
    in fast path.

    Signed-off-by: Eric Dumazet
    Cc: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Allow dissection of ICMP(V6) type and code. This should only occur
    if a packet is ICMP(V6) and the dissector has FLOW_DISSECTOR_KEY_ICMP set.

    There are currently no users of FLOW_DISSECTOR_KEY_ICMP.
    A follow-up patch will allow FLOW_DISSECTOR_KEY_ICMP to be used by
    the flower classifier.

    Signed-off-by: Simon Horman
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Simon Horman
     

08 Dec, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains a large Netfilter update for net-next,
    to summarise:

    1) Add support for stateful objects. This series provides a nf_tables
    native alternative to the extended accounting infrastructure for
    nf_tables. Two initial stateful objects are supported: counters and
    quotas. Objects are identified by a user-defined name, you can fetch
    and reset them anytime. You can also use a maps to allow fast lookups
    using any arbitrary key combination. More info at:

    http://marc.info/?l=netfilter-devel&m=148029128323837&w=2

    2) On-demand registration of nf_conntrack and defrag hooks per netns.
    Register nf_conntrack hooks if we have a stateful ruleset, ie.
    state-based filtering or NAT. The new nf_conntrack_default_on sysctl
    enables this from newly created netnamespaces. Default behaviour is not
    modified. Patches from Florian Westphal.

    3) Allocate 4k chunks and then use these for x_tables counter allocation
    requests, this improves ruleset load time and also datapath ruleset
    evaluation, patches from Florian Westphal.

    4) Add support for ebpf to the existing x_tables bpf extension.
    From Willem de Bruijn.

    5) Update layer 4 checksum if any of the pseudoheader fields is updated.
    This provides a limited form of 1:1 stateless NAT that make sense in
    specific scenario, eg. load balancing.

    6) Add support to flush sets in nf_tables. This series comes with a new
    set->ops->deactivate_one() indirection given that we have to walk
    over the list of set elements, then deactivate them one by one.
    The existing set->ops->deactivate() performs an element lookup that
    we don't need.

    7) Two patches to avoid cloning packets, thus speed up packet forwarding
    via nft_fwd from ingress. From Florian Westphal.

    8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to
    prevent infinite loops, patch from Dwip Banerjee. And one minor
    refactoring from Gao feng.

    9) Revisit recent log support for nf_tables netdev families: One patch
    to ensure that we correctly handle non-ethernet packets. Another
    patch to add missing logger definition for netdev. Patches from
    Liping Zhang.

    10) Three patches for nft_fib, one to address insufficient register
    initialization and another to solve incorrect (although harmless)
    byteswap operation. Moreover update xt_rpfilter and nft_fib to match
    lbcast packets with zeronet as source, eg. DHCP Discover packets
    (0.0.0.0 -> 255.255.255.255). Also from Liping Zhang.

    11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from
    Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has
    been broken in many-cast mode for some little time, let's give them a
    chance by placing them at the same level as other existing protocols.
    Thus, users don't explicitly have to modprobe support for this and
    NAT rules work for them. Some people point to the lack of support in
    SOHO Linux-based routers that make deployment of new protocols harder.
    I guess other middleboxes outthere on the Internet are also to blame.
    Anyway, let's see if this has any impact in the midrun.

    12) Skip software SCTP software checksum calculation if the NIC comes
    with SCTP checksum offload support. From Davide Caratti.

    13) Initial core factoring to prepare conversion to hook array. Three
    patches from Aaron Conole.

    14) Gao Feng made a wrong conversion to switch in the xt_multiport
    extension in a patch coming in the previous batch. Fix it in this
    batch.

    15) Get vmalloc call in sync with kmalloc flags to avoid a warning
    and likely OOM killer intervention from x_tables. From Marcelo
    Ricardo Leitner.

    16) Update Arturo Borrero's email address in all source code headers.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Dec, 2016

10 commits

  • Paolo noticed a cache line miss in UDP recvmsg() to access
    sk_rxhash, sharing a cache line with sk_drops.

    sk_drops might be heavily incremented by cpus handling a flood targeting
    this socket.

    We might place sk_drops on a separate cache line, but lets try
    to avoid wasting 64 bytes per socket just for this, since we have
    other bottlenecks to take care of.

    sock_rps_record_flow() should only access sk_rxhash for connected
    flows.

    Testing sk_state for TCP_ESTABLISHED covers most of the cases for
    connected sockets, for a zero cost, since system calls using
    sock_rps_record_flow() also access sk->sk_prot which is on the
    same cache line.

    A follow up patch will provide a static_key (Jump Label) since most
    hosts do not even use RFS.

    Signed-off-by: Eric Dumazet
    Reported-by: Paolo Abeni
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch adds support for set flushing, that consists of walking over
    the set elements if the NFTA_SET_ELEM_LIST_ELEMENTS attribute is set.
    This patch requires the following changes:

    1) Add set->ops->deactivate_one() operation: This allows us to
    deactivate an element from the set element walk path, given we can
    skip the lookup that happens in ->deactivate().

    2) Add a new nft_trans_alloc_gfp() function since we need to allocate
    transactions using GFP_ATOMIC given the set walk path happens with
    held rcu_read_lock.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch allows you to refer to stateful objects from set elements.
    This provides the infrastructure to create maps where the right hand
    side of the mapping is a stateful object.

    This allows us to build dictionaries of stateful objects, that you can
    use to perform fast lookups using any arbitrary key combination.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Notify on depleted quota objects. The NFT_QUOTA_F_DEPLETED flag
    indicates we have reached overquota.

    Add pointer to table from nft_object, so we can use it when sending the
    depletion notification to userspace.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Introduce nf_tables_obj_notify() to notify internal state changes in
    stateful objects. This is used by the quota object to report depletion
    in a follow up patch.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic
    dump-and-reset of the stateful object. This also comes with add support
    for atomic dump and reset for counter and quota objects.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch augments nf_tables to support stateful objects. This new
    infrastructure allows you to create, dump and delete stateful objects,
    that are identified by a user-defined name.

    This patch adds the generic infrastructure, follow up patches add
    support for two stateful objects: counters and quotas.

    This patch provides a native infrastructure for nf_tables to replace
    nfacct, the extended accounting infrastructure for iptables.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • ... so we can use current skb instead of working with a clone.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This patch adds a new flag that signals the kernel to update layer 4
    checksum if the packet field belongs to the layer 4 pseudoheader. This
    implicitly provides stateless NAT 1:1 that is useful under very specific
    usecases.

    Since rules mangling layer 3 fields that are part of the pseudoheader
    may potentially convey any layer 4 packet, we have to deal with the
    layer 4 checksum adjustment using protocol specific code.

    This patch adds support for TCP, UDP and ICMPv6, since they include the
    pseudoheader in the layer 4 checksum calculation. ICMP doesn't, so we
    can skip it.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • nf_defrag modules for ipv4 and ipv6 export an empty stub function.
    Any module that needs the defragmentation hooks registered simply 'calls'
    this empty function to create a phony module dependency -- modprobe will
    then load the defrag module too.

    This extends netfilter ipv4/ipv6 defragmentation modules to delay the hook
    registration until the functionality is requested within a network namespace
    instead of module load time for all namespaces.

    Hooks are only un-registered on module unload or when a namespace that used
    such defrag functionality exits.

    We have to use struct net for this as the register hooks can be called
    before netns initialization here from the ipv4/ipv6 conntrack module
    init path.

    There is no unregister functionality support, defrag will always be
    active once it was requested inside a net namespace.

    The reason is that defrag has impact on nft and iptables rulesets
    (without defrag we might see framents).

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

06 Dec, 2016

5 commits

  • 1) Old code was hard to maintain, due to complex lock chains.
    (We probably will be able to remove some kfree_rcu() in callers)

    2) Using a single timer to update all estimators does not scale.

    3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
    is not supposed to work well)

    In this rewrite :

    - I removed the RB tree that had to be scanned in
    gen_estimator_active(). qdisc dumps should be much faster.

    - Each estimator has its own timer.

    - Estimations are maintained in net_rate_estimator structure,
    instead of dirtying the qdisc. Minor, but part of the simplification.

    - Reading the estimator uses RCU and a seqcount to provide proper
    support for 32bit kernels.

    - We reduce memory need when estimators are not used, since
    we store a pointer, instead of the bytes/packets counters.

    - xt_rateest_mt() no longer has to grab a spinlock.
    (In the future, xt_rateest_tg() could be switched to per cpu counters)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • …etooth/bluetooth-next

    Johan Hedberg says:

    ====================
    pull request: bluetooth-next 2016-12-03

    Here's a set of Bluetooth & 802.15.4 patches for net-next (i.e. 4.10
    kernel):

    - Fix for a potential NULL deref in the ieee802154 netlink code
    - Fix for the ED values of the at86rf2xx driver
    - Documentation updates to ieee802154
    - Cleanups to u8 vs __u8 usage
    - Timer API usage cleanups in HCI drivers

    Please let me know if there are any issues pulling. Thanks.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Group fields used in TX path, and keep some cache lines mostly read
    to permit sharing among cpus.

    Gained two 4 bytes holes on 64bit arches.

    Added a place holder for tcp tsq_flags, next to sk_wmem_alloc
    to speed up tcp_wfree() in the following patch.

    I have not added ____cacheline_aligned_in_smp, this might be done later.
    I prefer doing this once inet and tcp/udp sockets reorg is also done.

    Tested with both TCP and UDP.

    UDP receiver performance under flood increased by ~20 % :
    Accessing sk_filter/sk_wq/sk_napi_id no longer stalls because sk_drops
    was moved away from a critical cache line, now mostly read and shared.

    /* --- cacheline 4 boundary (256 bytes) --- */
    unsigned int sk_napi_id; /* 0x100 0x4 */
    int sk_rcvbuf; /* 0x104 0x4 */
    struct sk_filter * sk_filter; /* 0x108 0x8 */
    union {
    struct socket_wq * sk_wq; /* 0x8 */
    struct socket_wq * sk_wq_raw; /* 0x8 */
    }; /* 0x110 0x8 */
    struct xfrm_policy * sk_policy[2]; /* 0x118 0x10 */
    struct dst_entry * sk_rx_dst; /* 0x128 0x8 */
    struct dst_entry * sk_dst_cache; /* 0x130 0x8 */
    atomic_t sk_omem_alloc; /* 0x138 0x4 */
    int sk_sndbuf; /* 0x13c 0x4 */
    /* --- cacheline 5 boundary (320 bytes) --- */
    int sk_wmem_queued; /* 0x140 0x4 */
    atomic_t sk_wmem_alloc; /* 0x144 0x4 */
    long unsigned int sk_tsq_flags; /* 0x148 0x8 */
    struct sk_buff * sk_send_head; /* 0x150 0x8 */
    struct sk_buff_head sk_write_queue; /* 0x158 0x18 */
    __s32 sk_peek_off; /* 0x170 0x4 */
    int sk_write_pending; /* 0x174 0x4 */
    long int sk_sndtimeo; /* 0x178 0x8 */

    Signed-off-by: Eric Dumazet
    Tested-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Dec, 2016

10 commits

  • This switch (default on) can be used to disable automatic registration
    of connection tracking functionality in newly created network
    namespaces.

    This means that when net namespace goes down (or the tracker protocol
    module is unloaded) we *might* have to unregister the hooks.

    We can either add another per-netns variable that tells if
    the hooks got registered by default, or, alternatively, just call
    the protocol _put() function and have the callee deal with a possible
    'extra' put() operation that doesn't pair with a get() one.

    This uses the latter approach, i.e. a put() without a get has no effect.

    Conntrack is still enabled automatically regardless of the new sysctl
    setting if the new net namespace requires connection tracking, e.g. when
    NAT rules are created.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This makes use of nf_ct_netns_get/put added in previous patch.
    We add get/put functions to nf_conntrack_l3proto structure, ipv4 and ipv6
    then implement use-count to track how many users (nft or xtables modules)
    have a dependency on ipv4 and/or ipv6 connection tracking functionality.

    When count reaches zero, the hooks are unregistered.

    This delays activation of connection tracking inside a namespace until
    stateful firewall rule or nat rule gets added.

    This patch breaks backwards compatibility in the sense that connection
    tracking won't be active anymore when the protocol tracker module is
    loaded. This breaks e.g. setups that ctnetlink for flow accounting and
    the like, without any '-m conntrack' packet filter rules.

    Followup patch restores old behavour and makes new delayed scheme
    optional via sysctl.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • currently aliased to try_module_get/_put.
    Will be changed in next patch when we add functions to make use of ->net
    argument to store usercount per l3proto tracker.

    This is needed to avoid registering the conntrack hooks in all netns and
    later only enable connection tracking in those that need conntrack.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • since adf0516845bcd0 ("netfilter: remove ip_conntrack* sysctl compat code")
    the only user (ipv4 tracker) sets this to an empty stub function.

    After this change nf_ct_l3proto_pernet_register() is also empty,
    but this will change in a followup patch to add conditional register
    of the hooks.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • CONFIG_NF_CT_PROTO_UDPLITE is no more a tristate. When set to y,
    connection tracking support for UDPlite protocol is built-in into
    nf_conntrack.ko.

    footprint test:
    $ ls -l net/netfilter/nf_conntrack{_proto_udplite,}.ko \
    net/ipv4/netfilter/nf_conntrack_ipv4.ko \
    net/ipv6/netfilter/nf_conntrack_ipv6.ko

    (builtin)|| udplite| ipv4 | ipv6 |nf_conntrack
    ---------++--------+--------+--------+--------------
    none || 432538 | 828755 | 828676 | 6141434
    UDPlite || - | 829649 | 829362 | 6498204

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     
  • CONFIG_NF_CT_PROTO_SCTP is no more a tristate. When set to y, connection
    tracking support for SCTP protocol is built-in into nf_conntrack.ko.

    footprint test:
    $ ls -l net/netfilter/nf_conntrack{_proto_sctp,}.ko \
    net/ipv4/netfilter/nf_conntrack_ipv4.ko \
    net/ipv6/netfilter/nf_conntrack_ipv6.ko

    (builtin)|| sctp | ipv4 | ipv6 | nf_conntrack
    ---------++--------+--------+--------+--------------
    none || 498243 | 828755 | 828676 | 6141434
    SCTP || - | 829254 | 829175 | 6547872

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     
  • CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection
    tracking support for DCCP protocol is built-in into nf_conntrack.ko.

    footprint test:
    $ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \
    net/ipv4/netfilter/nf_conntrack_ipv4.ko \
    net/ipv6/netfilter/nf_conntrack_ipv6.ko

    (builtin)|| dccp | ipv4 | ipv6 | nf_conntrack
    ---------++--------+--------+--------+--------------
    none || 469140 | 828755 | 828676 | 6141434
    DCCP || - | 830566 | 829935 | 6533526

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     
  • In netdev family, we will handle non ethernet packets, so using
    eth_hdr(skb)->h_proto is incorrect.

    Meanwhile, we can use socket(AF_PACKET...) to sending packets, so
    skb->protocol is not always set in bridge family.

    Add an extra parameter into nf_log_l2packet to solve this issue.

    Fixes: 1fddf4bad0ac ("netfilter: nf_log: add packet logging for netdev family")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • CONFIG_NF_NAT_PROTO_UDPLITE is no more a tristate. When set to y, NAT
    support for UDPlite protocol is built-in into nf_nat.ko.

    footprint test:

    (nf_nat_proto_) |udplite || nf_nat
    --------------------------+--------++--------
    no builtin | 408048 || 2241312
    UDPLITE builtin | - || 2577256

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     
  • CONFIG_NF_NAT_PROTO_SCTP is no more a tristate. When set to y, NAT
    support for SCTP protocol is built-in into nf_nat.ko.

    footprint test:

    (nf_nat_proto_) | sctp || nf_nat
    --------------------------+--------++--------
    no builtin | 428344 || 2241312
    SCTP builtin | - || 2597032

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti