10 May, 2016

1 commit


05 May, 2016

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2016-05-04

    1) The flowcache can hit an OOM condition if too
    many entries are in the gc_list. Fix this by
    counting the entries in the gc_list and refuse
    new allocations if the value is too high.

    2) The inner headers are invalid after a xfrm transformation,
    so reset the skb encapsulation field to ensure nobody tries
    access the inner headers. Otherwise tunnel devices stacked
    on top of xfrm may build the outer headers based on wrong
    informations.

    3) Add pmtu handling to vti, we need it to report
    pmtu informations for local generated packets.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Apr, 2016

1 commit


25 Mar, 2016

1 commit

  • A crash is observed when a decrypted packet is processed in receive
    path. get_rps_cpus() tries to dereference the skb->dev fields but it
    appears that the device is freed from the poison pattern.

    [] get_rps_cpu+0x94/0x2f0
    [] netif_rx_internal+0x140/0x1cc
    [] netif_rx+0x74/0x94
    [] xfrm_input+0x754/0x7d0
    [] xfrm_input_resume+0x10/0x1c
    [] esp_input_done+0x20/0x30
    [] process_one_work+0x244/0x3fc
    [] worker_thread+0x2f8/0x418
    [] kthread+0xe0/0xec

    -013|get_rps_cpu(
    | dev = 0xFFFFFFC08B688000,
    | skb = 0xFFFFFFC0C76AAC00 -> (
    | dev = 0xFFFFFFC08B688000 -> (
    | name =
    "......................................................
    | name_hlist = (next = 0xAAAAAAAAAAAAAAAA, pprev =
    0xAAAAAAAAAAA

    Following are the sequence of events observed -

    - Encrypted packet in receive path from netdevice is queued
    - Encrypted packet queued for decryption (asynchronous)
    - Netdevice brought down and freed
    - Packet is decrypted and returned through callback in esp_input_done
    - Packet is queued again for process in network stack using netif_rx

    Since the device appears to have been freed, the dereference of
    skb->dev in get_rps_cpus() leads to an unhandled page fault
    exception.

    Fix this by holding on to device reference when queueing packets
    asynchronously and releasing the reference on call back return.

    v2: Make the change generic to xfrm as mentioned by Steffen and
    update the title to xfrm

    Suggested-by: Herbert Xu
    Signed-off-by: Jerome Stanislaus
    Signed-off-by: Subash Abhinov Kasiviswanathan
    Signed-off-by: David S. Miller

    subashab@codeaurora.org
     

23 Mar, 2016

1 commit


17 Mar, 2016

1 commit


27 Jan, 2016

1 commit

  • This patch removes the last reference to hash and ablkcipher from
    IPsec and replaces them with ahash and skcipher respectively. For
    skcipher there is currently no difference at all, while for ahash
    the current code is actually buggy and would prevent asynchronous
    algorithms from being discovered.

    Signed-off-by: Herbert Xu
    Acked-by: David S. Miller

    Herbert Xu
     

16 Jan, 2016

1 commit

  • Skb_gso_segment() uses skb control block during segmentation.
    This patch adds 32-bytes room for previous control block which
    will be copied into all resulting segments.

    This patch fixes kernel crash during fragmenting forwarded packets.
    Fragmentation requires valid IP CB in skb for clearing ip options.
    Also patch removes custom save/restore in ovs code, now it's redundant.

    Signed-off-by: Konstantin Khlebnikov
    Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

23 Dec, 2015

1 commit


12 Dec, 2015

2 commits

  • XFRM can deal with SYNACK messages, sent while listener socket
    is not locked. We add proper rcu protection to __xfrm_sk_clone_policy()
    and xfrm_sk_policy_lookup()

    This might serve as the first step to remove xfrm.xfrm_policy_lock
    use in fast path.

    Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
    Signed-off-by: Eric Dumazet
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We will soon switch sk->sk_policy[] to RCU protection,
    as SYNACK packets are sent while listener socket is not locked.

    This patch simply adds RCU grace period before struct xfrm_policy
    freeing, and the corresponding rcu_head in struct xfrm_policy.

    Signed-off-by: Eric Dumazet
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Dec, 2015

1 commit

  • TCP SYNACK messages might now be attached to request sockets.

    XFRM needs to get back to a listener socket.

    Adds new helpers that might be used elsewhere :
    sk_to_full_sk() and sk_const_to_full_sk()

    Note: We also need to add RCU protection for xfrm lookups,
    now TCP/DCCP have lockless listener processing. This will
    be addressed in separate patches.

    Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
    Reported-by: Dave Jones
    Signed-off-by: Eric Dumazet
    Cc: Steffen Klassert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Nov, 2015

1 commit

  • Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
    templates; their dst_entries counters will never be used. Move the
    xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
    xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
    and dst_entries_destroy for each net namespace.

    The ipv4 and ipv6 xfrms each create dst_ops template, and perform
    dst_entries_init on the templates. The template values are copied to each
    net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops
    pcpuc_entries field is a percpu counter and cannot be used correctly by
    simply copying it to another object.

    The result of this is a very subtle bug; changes to the dst entries
    counter from one net namespace may sometimes get applied to a different
    net namespace dst entries counter. This is because of how the percpu
    counter works; it has a main count field as well as a pointer to the
    percpu variables. Each net namespace maintains its own main count
    variable, but all point to one set of percpu variables. When any net
    namespace happens to change one of the percpu variables to outside its
    small batch range, its count is moved to the net namespace's main count
    variable. So with multiple net namespaces operating concurrently, the
    dst_ops entries counter can stray from the actual value that it should
    be; if counts are consistently moved from one net namespace to another
    (which my testing showed is likely), then one net namespace winds up
    with a negative dst_ops count while another winds up with a continually
    increasing count, eventually reaching its gc_thresh limit, which causes
    all new traffic on the net namespace to fail with -ENOBUFS.

    Signed-off-by: Dan Streetman
    Signed-off-by: Dan Streetman
    Signed-off-by: Steffen Klassert

    Dan Streetman
     

30 Oct, 2015

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2015-10-30

    1) The flow cache is limited by the flow cache limit which
    depends on the number of cpus and the xfrm garbage collector
    threshold which is independent of the number of cpus. This
    leads to the fact that on systems with more than 16 cpus
    we hit the xfrm garbage collector limit and refuse new
    allocations, so new flows are dropped. On systems with 16
    or less cpus, we hit the flowcache limit. In this case, we
    shrink the flow cache instead of refusing new flows.

    We increase the xfrm garbage collector threshold to INT_MAX
    to get the same behaviour, independent of the number of cpus.

    2) Fix some unaligned accesses on sparc systems.
    From Sowmini Varadhan.

    3) Fix some header checks in _decode_session4. We may call
    pskb_may_pull with a negative value converted to unsigened
    int from pskb_may_pull. This can lead to incorrect policy
    lookups. We fix this by a check of the data pointer position
    before we call pskb_may_pull.

    4) Reload skb header pointers after calling pskb_may_pull
    in _decode_session4 as this may change the pointers into
    the packet.

    5) Add a missing statistic counter on inner mode errors.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Oct, 2015

1 commit

  • Conflicts:
    net/ipv6/xfrm6_output.c
    net/openvswitch/flow_netlink.c
    net/openvswitch/vport-gre.c
    net/openvswitch/vport-vxlan.c
    net/openvswitch/vport.c
    net/openvswitch/vport.h

    The openvswitch conflicts were overlapping changes. One was
    the egress tunnel info fix in 'net' and the other was the
    vport ->send() op simplification in 'net-next'.

    The xfrm6_output.c conflicts was also a simplification
    overlapping a bug fix.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Oct, 2015

2 commits

  • Increment the LINUX_MIB_XFRMINSTATEMODEERROR statistic counter
    to notify about dropped packets if we fail to fetch a inner mode.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • On sparc, deleting established SAs (e.g., by restarting ipsec)
    results in unaligned access messages via xfrm_del_sa ->
    km_state_notify -> xfrm_send_state_notify().

    Even though struct xfrm_usersa_info is aligned on 8-byte boundaries,
    netlink attributes are fundamentally only 4 byte aligned, and this
    cannot be changed for nla_data() that is passed up to userspace.
    As a result, the put_unaligned() macro needs to be used to
    set up potentially unaligned fields such as the xfrm_stats in
    copy_to_user_state()

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: Steffen Klassert

    Sowmini Varadhan
     

08 Oct, 2015

5 commits


29 Sep, 2015

1 commit

  • Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry
    timer (XFRMA_ETIMER_THRESH) of a state without having to set other
    attributes like replay counter and byte lifetime. Changing these other
    values while traffic flows will break the state.

    Signed-off-by: Michael Rossberg
    Signed-off-by: Steffen Klassert

    Michael Rossberg
     

26 Sep, 2015

1 commit


18 Sep, 2015

5 commits

  • In code review it was noticed that I had failed to add some blank lines
    in places where they are customarily used. Taking a second look at the
    code I have to agree blank lines would be nice so I have added them
    here.

    Reported-by: Nicolas Dichtel
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Add a sock paramter to dst_output making dst_output_sk superfluous.
    Add a skb->sk parameter to all of the callers of dst_output
    Have the callers of dst_output_sk call dst_output.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

03 Sep, 2015

1 commit

  • Pull networking updates from David Miller:
    "Another merge window, another set of networking changes. I've heard
    rumblings that the lightweight tunnels infrastructure has been voted
    networking change of the year. But what do I know?

    1) Add conntrack support to openvswitch, from Joe Stringer.

    2) Initial support for VRF (Virtual Routing and Forwarding), which
    allows the segmentation of routing paths without using multiple
    devices. There are some semantic kinks to work out still, but
    this is a reasonably strong foundation. From David Ahern.

    3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.

    4) Ignore route nexthops with a link down state in ipv6, just like
    ipv4. From Andy Gospodarek.

    5) Remove spinlock from fast path of act_gact and act_mirred, from
    Eric Dumazet.

    6) Document the DSA layer, from Florian Fainelli.

    7) Add netconsole support to bcmgenet, systemport, and DSA. Also
    from Florian Fainelli.

    8) Add Mellanox Switch Driver and core infrastructure, from Jiri
    Pirko.

    9) Add support for "light weight tunnels", which allow for
    encapsulation and decapsulation without bearing the overhead of a
    full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of
    others.

    10) Add Identifier Locator Addressing support for ipv6, from Tom
    Herbert.

    11) Support fragmented SKBs in iwlwifi, from Johannes Berg.

    12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.

    13) Add BQL support to 3c59x driver, from Loganaden Velvindron.

    14) Stop using a zero TX queue length to mean that a device shouldn't
    have a qdisc attached, use an explicit flag instead. From Phil
    Sutter.

    15) Use generic geneve netdevice infrastructure in openvswitch, from
    Pravin B Shelar.

    16) Add infrastructure to avoid re-forwarding a packet in software
    that was already forwarded by a hardware switch. From Scott
    Feldman.

    17) Allow AF_PACKET fanout function to be implemented in a bpf
    program, from Willem de Bruijn"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
    netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
    netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
    net: fec: clear receive interrupts before processing a packet
    ipv6: fix exthdrs offload registration in out_rt path
    xen-netback: add support for multicast control
    bgmac: Update fixed_phy_register()
    sock, diag: fix panic in sock_diag_put_filterinfo
    flow_dissector: Use 'const' where possible.
    flow_dissector: Fix function argument ordering dependency
    ixgbe: Resolve "initialized field overwritten" warnings
    ixgbe: Remove bimodal SR-IOV disabling
    ixgbe: Add support for reporting 2.5G link speed
    ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
    ixgbe: support for ethtool set_rxfh
    ixgbe: Avoid needless PHY access on copper phys
    ixgbe: cleanup to use cached mask value
    ixgbe: Remove second instance of lan_id variable
    ixgbe: use kzalloc for allocating one thing
    flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
    ixgbe: Remove unused PCI bus types
    ...

    Linus Torvalds
     

17 Aug, 2015

1 commit


11 Aug, 2015

2 commits


21 Jul, 2015

1 commit


25 Jun, 2015

1 commit

  • Pull networking updates from David Miller:

    1) Add TX fast path in mac80211, from Johannes Berg.

    2) Add TSO/GRO support to ibmveth, from Thomas Falcon

    3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

    4) Lots of new rhashtable tests, from Thomas Graf.

    5) Run ingress qdisc lockless, from Alexei Starovoitov.

    6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting. From Eric Dumazet.

    7) Add mode parameter to pktgen, for testing receive. From Alexei
    Starovoitov.

    8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

    9) Move page frag allocator under mm/, also from Alexander.

    10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

    11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

    12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier. From Jiri Pirko.

    13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics. From Willem de Bruijn.

    14) Add netdev driver for GENEVE tunnels, from John W Linville.

    15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

    16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

    17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

    18) Add tail call support to BPF, from Alexei Starovoitov.

    19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

    20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

    21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion. From Eric Dumazet.

    22) Add Cavium ThunderX driver, from Sunil Goutham.

    23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

    24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

    25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation. From Wei Liu.

    26) Add more entropy inputs to flow dissector, from Tom Herbert.

    27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

    28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

    29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
    bridge: vlan: flush the dynamically learned entries on port vlan delete
    bridge: multicast: add a comment to br_port_state_selection about blocking state
    net: inet_diag: export IPV6_V6ONLY sockopt
    stmmac: troubleshoot unexpected bits in des0 & des1
    net: ipv4 sysctl option to ignore routes when nexthop link is down
    net: track link-status of ipv4 nexthops
    net: switchdev: ignore unsupported bridge flags
    net: Cavium: Fix MAC address setting in shutdown state
    drivers: net: xgene: fix for ACPI support without ACPI
    ip: report the original address of ICMP messages
    net/mlx5e: Prefetch skb data on RX
    net/mlx5e: Pop cq outside mlx5e_get_cqe
    net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
    net/mlx5e: Remove extra spaces
    net/mlx5e: Avoid TX CQE generation if more xmit packets expected
    net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
    net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
    net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
    net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
    net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
    ...

    Linus Torvalds
     

23 Jun, 2015

1 commit

  • Pull crypto update from Herbert Xu:
    "Here is the crypto update for 4.2:

    API:

    - Convert RNG interface to new style.

    - New AEAD interface with one SG list for AD and plain/cipher text.
    All external AEAD users have been converted.

    - New asymmetric key interface (akcipher).

    Algorithms:

    - Chacha20, Poly1305 and RFC7539 support.

    - New RSA implementation.

    - Jitter RNG.

    - DRBG is now seeded with both /dev/random and Jitter RNG. If kernel
    pool isn't ready then DRBG will be reseeded when it is.

    - DRBG is now the default crypto API RNG, replacing krng.

    - 842 compression (previously part of powerpc nx driver).

    Drivers:

    - Accelerated SHA-512 for arm64.

    - New Marvell CESA driver that supports DMA and more algorithms.

    - Updated powerpc nx 842 support.

    - Added support for SEC1 hardware to talitos"

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (292 commits)
    crypto: marvell/cesa - remove COMPILE_TEST dependency
    crypto: algif_aead - Temporarily disable all AEAD algorithms
    crypto: af_alg - Forbid the use internal algorithms
    crypto: echainiv - Only hold RNG during initialisation
    crypto: seqiv - Add compatibility support without RNG
    crypto: eseqiv - Offer normal cipher functionality without RNG
    crypto: chainiv - Offer normal cipher functionality without RNG
    crypto: user - Add CRYPTO_MSG_DELRNG
    crypto: user - Move cryptouser.h to uapi
    crypto: rng - Do not free default RNG when it becomes unused
    crypto: skcipher - Allow givencrypt to be NULL
    crypto: sahara - propagate the error on clk_disable_unprepare() failure
    crypto: rsa - fix invalid select for AKCIPHER
    crypto: picoxcell - Update to the current clk API
    crypto: nx - Check for bogus firmware properties
    crypto: marvell/cesa - add DT bindings documentation
    crypto: marvell/cesa - add support for Kirkwood and Dove SoCs
    crypto: marvell/cesa - add support for Orion SoCs
    crypto: marvell/cesa - add allhwsupport module parameter
    crypto: marvell/cesa - add support for all armada SoCs
    ...

    Linus Torvalds
     

04 Jun, 2015

1 commit


02 Jun, 2015

1 commit

  • Conflicts:
    drivers/net/phy/amd-xgbe-phy.c
    drivers/net/wireless/iwlwifi/Kconfig
    include/net/mac80211.h

    iwlwifi/Kconfig and mac80211.h were both trivial overlapping
    changes.

    The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
    the bug fix that happened on the 'net' side is already integrated
    into the rest of the amd-xgbe driver.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 May, 2015

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2015-05-28

    1) Remove xfrm_queue_purge as this is the same as skb_queue_purge.

    2) Optimize policy and state walk.

    3) Use a sane return code if afinfo registration fails.

    4) Only check fori a acquire state if the state is not valid.

    5) Remove a unnecessary NULL check before xfrm_pol_hold
    as it checks the input for NULL.

    6) Return directly if the xfrm hold queue is empty, avoid
    to take a lock as it is nothing to do in this case.

    7) Optimize the inexact policy search and allow for matching
    of policies with priority ~0U.

    All from Li RongQing.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

28 May, 2015

1 commit